home.social

#bson — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #bson, aggregated by home.social.

  1. #ITByte: A well-designed data format is dictated by what makes the information the easiest for the intended audience to understand. While exchanging data between two systems.

    #Data formats for the #Web - a short overview and comparison of #XML, #JSON, #BSON, #YAML and more...

    knowledgezone.co.in/posts/Data

  2. Проектируем как синьор: универсальная бинаризация

    Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная возможность подойти к вопросу с чистым разумом. А это крайне важно, если мы хотим спроектировать что-то по настоящему хорошо, а не как обычно. Что ещё за VaryPack?

    habr.com/ru/articles/975020/

    #VaryPack #MsgPack #CBOR #BSON

  3. Проектируем как синьор: универсальная бинаризация

    Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная возможность подойти к вопросу с чистым разумом. А это крайне важно, если мы хотим спроектировать что-то по настоящему хорошо, а не как обычно. Что ещё за VaryPack?

    habr.com/ru/articles/975020/

    #VaryPack #MsgPack #CBOR #BSON

  4. Проектируем как синьор: универсальная бинаризация

    Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная возможность подойти к вопросу с чистым разумом. А это крайне важно, если мы хотим спроектировать что-то по настоящему хорошо, а не как обычно. Что ещё за VaryPack?

    habr.com/ru/articles/975020/

    #VaryPack #MsgPack #CBOR #BSON

  5. Проектируем как синьор: универсальная бинаризация

    Здравствуйте, меня зовут Дмитрий Карловский и я.. да не важно кто я. Важно о чём я говорю, и как аргументирую. Кто меня знает, тому и не надо рассказывать. А кто не знает — у того есть прекрасная возможность подойти к вопросу с чистым разумом. А это крайне важно, если мы хотим спроектировать что-то по настоящему хорошо, а не как обычно. Что ещё за VaryPack?

    habr.com/ru/articles/975020/

    #VaryPack #MsgPack #CBOR #BSON

  6. JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

    VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. Что за модная тема?

    habr.com/ru/articles/966270/

    #VaryPack #MsgPack #CBOR #JSON #JSONB #BSON

  7. JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

    VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. Что за модная тема?

    habr.com/ru/articles/966270/

    #VaryPack #MsgPack #CBOR #JSON #JSONB #BSON

  8. JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

    VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. Что за модная тема?

    habr.com/ru/articles/966270/

    #VaryPack #MsgPack #CBOR #JSON #JSONB #BSON

  9. JSON? JSONB? BSON? CBOR? MsgPack? А, VaryPackǃ

    VaryPack - новый, простой, гибкий, шустрый и компактный формат бинарной сериализации произвольных данных. Что за модная тема?

    habr.com/ru/articles/966270/

    #VaryPack #MsgPack #CBOR #JSON #JSONB #BSON

  10. CW: Stop using JSON everywhere. [Long post]

    I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

    String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
    I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

    Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

    Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

    [^1] formula for base64 string length is:
    4 * ceil(original_length / 3)

    Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
    Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
    And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
    Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

    And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

    By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

    #json #msgpack #bson
    #web #performance #optimization
    #advice

  11. CW: Stop using JSON everywhere. [Long post]

    I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

    String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
    I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

    Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

    Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

    [^1] formula for base64 string length is:
    4 * ceil(original_length / 3)

    Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
    Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
    And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
    Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

    And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

    By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

    #json #msgpack #bson
    #web #performance #optimization
    #advice

  12. CW: Stop using JSON everywhere. [Long post]

    I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

    String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
    I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

    Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

    Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

    [^1] formula for base64 string length is:
    4 * ceil(original_length / 3)

    Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
    Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
    And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
    Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

    And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

    By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

    #json #msgpack #bson
    #web #performance #optimization
    #advice

  13. CW: Stop using JSON everywhere. [Long post]

    I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

    String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
    I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

    Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

    Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

    [^1] formula for base64 string length is:
    4 * ceil(original_length / 3)

    Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
    Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
    And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
    Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

    And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

    By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

    #json #msgpack #bson
    #web #performance #optimization
    #advice

  14. CW: Stop using JSON everywhere. [Long post]

    I remember some article telling how a company was constantly hitting AWS quotas, because their JSON payload, itself fitting into limits, was put into a string field inside another JSON object, used for communication between servers, therefore all quotes and backslashes were double-escaped as \" and \\, increasing payload size.

    String encoding is also the reason why, for example, serde (Rust de-/serialization library) can give you Cow<str> if you don't want extra allocations: when there are no character escapes, a reference to the original input can be passed, but in case we're doing \"->" replacements, we need to copy this part of input anyway.
    I don't say it's bad, that is how a text format works in common, not only JSON. But if you need to put some arbitrary data into objects, think twice, probably a binary format like MessagePack, BSON or even custom ProtoBuf will be much more efficient for your task.

    Also, text formats are basically not suitable for streaming, while loading a big object into RAM is a very bad idea. If it's an array, you can separate objects by newline instead of using JSON's [ ]. In other cases, search for a SAX-like library (or smth like "stream json") for your programming language.

    Now I won't give a specific example, but I'm sure there are developers doing this: encoding a file with base64 to send it inside a JSON request. Please, remember that b64 bloats payload approximately by 1.33x [^1], so you should always either send a file with an additional HTTP request or use multipart form data type. Oh, or encode your objects with a binary format. Last two options are OK when you're working with small files and insist on doing everything in one request, otherwise upload data in different reqs in parallel.

    [^1] formula for base64 string length is:
    4 * ceil(original_length / 3)

    Another example of "how definitely NOT to do" is Piped (privacy frontend for YouTube), on some API endpoints it provides a nextpage object, containing session info used to request the next page for a channel, a playlist, search results or comments, and the problem is that it's a JSON object put inside a string as explained above: "nextpage":"{\"url\":\"https…
    Even funnier, there are body field inside this nextpage object that contains another JSON object, encoded in base64, so there are 3 layers of text format encoding.
    And when a client requests the next page, the object is sent in GET querystring parameters, so it gets urlencoded (percent-encoded), resulting in 4 layers!! Idk why browsers don't reject its long ugly URLs.
    Everything before querystring is excusable if the internal YT API itself requires such format for a context/session object. Invidious doesn't care about context at all and sends a clean request, if I got it right.

    And the most stupid JSON usecase is JWT, I think. It encodes already-plaintext format with base64 (intended for converting binary data to ASCII text; the same as in Piped, but we forgave it), moreover, it does this to 2 objects, and stores a token with such a big overhead in cookies.

    By the way, want a JSON config in your software? Take a look at Hjson that is much more convenient for writing by hand.

    #json #msgpack #bson
    #web #performance #optimization
    #advice

  15. #ITByte: A well-designed data format is dictated by what makes the information the easiest for the intended audience to understand.

    #Data formats for the #Web - a short overview and comparison of #XML, #JSON, #BSON, #YAML and more...

    knowledgezone.co.in/posts/Data

  16. In #python is #setuptools special and assumed to be always installed even though it isn't part of the standard library? (sort of a phantom stdlib)

    e.g. is this setup.py file from #bson wrong, in that it import setuptools but isn't in the list of dependencies?
    github.com/py-bson/bson/blob/m

    anyhow, install blew up on my build

  17. #ITByte: A well-designed data format is dictated by what makes the information the easiest for the intended audience to understand. While exchanging data between two systems,

    #Data formats for the #Web - a short overview and comparison of #XML, #JSON, #BSON, #YAML and more...

    knowledgezone.co.in/posts/Data

  18. Connaissez-vous la RFC 8949 ? Non ? Jusqu'à ce matin, à ma grande honte, moi non plus... Pourtant le sujet est d'importance : une alternative binaire, compacte, performante, normée et pérenne. Le #CBOR: Concise Binary Object Representation.
    cbor.io/

    Le seul tuto du site renvoie à un article fr de présentation de @bortzmeyer : bortzmeyer.org/7049.html

    #BSON, #protobuf, #MessagePack : chacun a ses avantages (et inconvénients) face à #JSON.
    Le CBOR est une couleur de cette palette.

  19. AFAIK there is no way to export a simple config of the #Ubiquiti #UniFi network #config.

    I've worked around it by:
    1. Downloading a backup
    2. Using a third party decrypt script to convert backup to zip
    3. Extracting files from zip
    4. Using #Mongo Tools to convert #BSON to #JSON
    5. Parsing in VSC

    But it is freaking painful, especially the parts where JSON has been squished into string fields.

    🤬! 🤬! 🤬!

  20. is a for parsing -like formats.

    jsoncons has a data model that allows for parsing different formats that resemble JSON, like and , using extensions. jsoncons provides several ways of interacting with parsed data; a query-able structure, a strongly typed C++ class, or a SAX-like parse stream. jsoncons is fast, and has extensions for things like JSONPath.

    Website 🔗️: danielaparker.github.io/jsonco

  21. @alva @dkl even for embedded things... #BSON, #CBOR, #ProtocollBuffers, #UBJSON and "Smile" are existing things.

    Yes you can write your own (or write whatever your env thinks the current binary representation should look like to disk) - but then you have no tooling, no portability and/or no validation/standardisation.