home.social

#ndjson — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #ndjson, aggregated by home.social.

  1. ----------------

    🛠️ Tool
    ===================

    Opening: gws is a community CLI that exposes Google Workspace APIs through a single, dynamically generated command surface. It queries Google's Discovery Service at runtime to enumerate available APIs and methods, then presents them with structured JSON outputs suitable for programmatic consumption and LLM agents.

    Key Features:
    • Dynamic API surface: Commands are constructed from the live Discovery Service metadata rather than a static command list, so new Google endpoints appear without manual updates.
    • Structured output: Responses are returned as JSON and support streaming as NDJSON for large paginated datasets.
    • Schema introspection: Built‑in capability to inspect request/response schemas (gws schema ... conceptually) to reveal method parameters and response fields.
    • Agent integration: Ships with 40+ AI agent skills to enable LLM-driven automation against Workspace resources.
    • Workspace coverage: Designed to work across Drive, Gmail, Calendar, Sheets, Chat and other Workspace APIs.

    Technical implementation:
    • The tool relies on Google's public Discovery Service to build its CLI model at runtime and map endpoints to uniform command semantics.
    • Output is intentionally structured for machine parsing (JSON/NDJSON), facilitating orchestration by external automation or LLM agents.
    • Authentication flows require OAuth credentials bound to a Google Cloud project; the CLI exposes authentication workflows conceptually to obtain tokens for API calls.

    Use cases:
    • Automating mailbox or Drive data exports into downstream pipelines.
    • Letting LLM agents manage calendar invites or create spreadsheets using structured JSON responses.
    • Rapidly scripting ad‑hoc queries against newly added Workspace methods without waiting for CLI updates.

    Limitations:
    • Not an officially supported Google product; operational stability and breaking changes are possible as the project evolves toward v1.0.
    • Runtime reliance on the Discovery Service implies behavior changes if Google alters the service or method metadata.
    • OAuth and Google Cloud project requirements remain prerequisites for access to organizational data.

    References:
    • Key technical terms: Discovery Service, NDJSON, OAuth, schema introspection.

    🔹 tool #gws #googleworkspace #api #ndjson

    🔗 Source: github.com/googleworkspace/cli

  2. I just came across a file format called #ndjson - just a few months after I read about #jsonl - and I thought, seriously? I've been using this format for years, and now all of a sudden it has two different names, within months of each other?

    Sometimes they say things come in threes...

  3. zeehaven – a tiny tool to convert data for social media research: publicdatalab.org/2023/12/18/z

    drag and drop ndjson data gathered with @dmi's zeeschuimer tool, and download a csv file. ✨📦✨

    #digitalmethods #csv #ndjson #tinytools #osint #newmedia #commodon #mediastudies #data #dataviz

  4. My family of packages for streaming JSON objects with NDJSON over RESTful APIs in the .NET ecosystem gained a new member - one that brings support for ASP​.NET Core Minimal APIs 🎉

    #AspNetCore #MinimalApi #AsyncStreams #NDJSON

    github.com/tpeczek/Ndjson.Asyn

  5. AspNetCore.JsonStreamer 0.2.0 is out now!

    JSON Lines streaming serializer on ASP.NET Core, uses standard type of IAsyncEnumerable<T>.

    It is simply to use, add AddNewtonsoftJsonStreamer() instead of AddNewtonsoftJson().

    github.com/kekyo/AspNetCore.Js

    #dotnet #aspnetcore #jsonlines #ndjson

  6. I've added requests cancellation capabilities to all (ASP​.NET Core, Blazor, .NET Console) my NDJSON and JSON streaming samples.

    #DotNet #AspNetCore #Blazor #NDJSON #AsyncStreams

    github.com/tpeczek/Demo.Ndjson

  7. @bdelacretaz I also came across github.com/ndjson/ndjson-spec which seems like the same thing, developed independently

  8. I never bothered with optimizing the parsing of #jsonl #ndjson files because in most cases it was an one off task before I put the data into a database or parquet file. But the files got bigger and waiting 20 minutes for the data to load made me reconsider my decision. So tried some different approaches.

    Tested with a 1.7 GB file of 300 k Tweets.

    jsonlines.reader: 17.2 seconds 100%
    orjson: 6.49 seconds 37%
    msgspec: 3.06 seconds 17%

    I like how orjson cuts the time by two thirds without the need to change anything else. Just use it as a drop in replacement and you are good.

    msgspec is twice as fast as orjson or six times as fast as jsonlines if you define the schema of the data that you want. For Tweets that's okay, as I can reuse the schema many times. With data that is used only once, I prefer orjson.

    Memory usage was nearly identical across the different solutions. Probably because they all parse the data per line. I restarted the kernel each time to get comparable numbers.

    Load time for all 23 million Tweets in the dataset was reduced from 25 to 4 minutes.

    This blogposts was useful to me: pythonspeed.com/articles/faste #Python #DataEngineering

  9. [#tools] ndjson-apply: apply a JS function to a stream of newline-delimited JSON
    github.com/maxlath/ndjson-appl #nodejs #NDJSON