home.social

Search

103 results for “cwensel”

  1. I'm hosting my docs on github, but is refusing to index the site fully.

    it shows "currently not indexed" in the search console. which in turn claims they don't want to overload the site.

    I get the sense that this is a shared problem for anyone hosting on github..

    any suggestions on alternatives or hacks?

  2. I'm hosting my #clusterless docs on github, but #google is refusing to index the site fully.

    it shows "currently not indexed" in the search console. which in turn claims they don't want to overload the site.

    I get the sense that this is a shared problem for anyone hosting on github..

    any suggestions on alternatives or hacks?

  3. I'm hosting my #clusterless docs on github, but #google is refusing to index the site fully.

    it shows "currently not indexed" in the search console. which in turn claims they don't want to overload the site.

    I get the sense that this is a shared problem for anyone hosting on github..

    any suggestions on alternatives or hacks?

  4. I'm hosting my #clusterless docs on github, but #google is refusing to index the site fully.

    it shows "currently not indexed" in the search console. which in turn claims they don't want to overload the site.

    I get the sense that this is a shared problem for anyone hosting on github..

    any suggestions on alternatives or hacks?

  5. I'm hosting my #clusterless docs on github, but #google is refusing to index the site fully.

    it shows "currently not indexed" in the search console. which in turn claims they don't want to overload the site.

    I get the sense that this is a shared problem for anyone hosting on github..

    any suggestions on alternatives or hacks?

  6. Released a new wip of , for pipelines, last night that includes reporting on both arcs (workloads) and datasets.

    github.com/ClusterlessHQ/clust

    Below is a summary of the three datasets the s3 log sample app creates.

    github.com/ClusterlessHQ/clust

    Note we track the difference between intervals that have no data (empty, which may be intentional) vs a gap (the workload didn't run and create data).

  7. Released a new wip of #clusterless, for #cloud #data pipelines, last night that includes reporting on both arcs (workloads) and datasets.

    github.com/ClusterlessHQ/clust

    Below is a summary of the three datasets the #AWS s3 log sample app creates.

    github.com/ClusterlessHQ/clust

    Note we track the difference between intervals that have no data (empty, which may be intentional) vs a gap (the workload didn't run and create data).

  8. Released a new wip of #clusterless, for #cloud #data pipelines, last night that includes reporting on both arcs (workloads) and datasets.

    github.com/ClusterlessHQ/clust

    Below is a summary of the three datasets the #AWS s3 log sample app creates.

    github.com/ClusterlessHQ/clust

    Note we track the difference between intervals that have no data (empty, which may be intentional) vs a gap (the workload didn't run and create data).

  9. Released a new wip of #clusterless, for #cloud #data pipelines, last night that includes reporting on both arcs (workloads) and datasets.

    github.com/ClusterlessHQ/clust

    Below is a summary of the three datasets the #AWS s3 log sample app creates.

    github.com/ClusterlessHQ/clust

    Note we track the difference between intervals that have no data (empty, which may be intentional) vs a gap (the workload didn't run and create data).

  10. Released a new wip of #clusterless, for #cloud #data pipelines, last night that includes reporting on both arcs (workloads) and datasets.

    github.com/ClusterlessHQ/clust

    Below is a summary of the three datasets the #AWS s3 log sample app creates.

    github.com/ClusterlessHQ/clust

    Note we track the difference between intervals that have no data (empty, which may be intentional) vs a gap (the workload didn't run and create data).

  11. getting closer...

    here is a screenshot of the cls command printing a summary table of workload (arc) completions since yesterday

    I need to release 2.0 of the library mini-parsers into maven central before I can push this out and begin work on dataset status (think fsck for workload results)

  12. hoping to make time to get another release out this week.

    I have commands to list deployed placements (regions etc), projects, and arcs (workloads). still need to get deployed datasets.

    and, status reporting of both arcs and datasets.

    that is, completed and failed arcs. and dataset completions, partials, empties, and gaps.

    if a gap is found, the arc was skipped or failed, here is where you can re-run workloads deterministically. from the cli.

  13. I'm thinking of resurrecting some code I have for Splunk like relative time adjusters

    docs.splunk.com/Documentation/

    the library mini-parsers is due for an update, modern parboiled supports jdk17 now.

    github.com/Heretical/mini-pars

    and status reporting needs time range support on the cli and the splunk syntax is fairly concise.

    anyone else interested in parser support?

  14. ok, finally! the and wip builds are published to

    github.com/ClusterlessHQ/homeb

    I'll update all the install docs this week.

  15. here is a little pre-announcement of a new library clusterloss-commons

    github.com/ClusterlessHQ/clust

    currently available in maven central.. but still under documented etc.

    this project allows for sharing of some core libraries I find useful developing clusterless and tessellate. as well as some basics to help with cdk development.

    i'll make a bigger announcement as it matures.

  16. I love how adds stuff to your agenda for you, and you can't remove it.

    Now i'll ignore the agenda I just crafted.

  17. probably time i sort out a real logo for

    github.com/ClusterlessHQ

    is 99designs still a thing?

  18. I've added a new how-to guide on creating a copy pipeline in s3 using only intrinsic components. As files get uploaded, they get copied to a new location.

    docs.clusterless.io/guide/1.0-

    This roughly mirrors the example project, but has a bunch more explainers and examples on using the cls command to build a project file.

    github.com/ClusterlessHQ

  19. Long weekend of yard work ahead but look forward to completing a set of improved documentation early next week.

    Using jackson json views I can print out json for required properties and full json so configuring a simple pipeline doesn’t seem so daunting.

    These in turn can be embedded directly in the docs online and help messages.

  20. I added include/exclude filters to the S3PutListenerBoundary and S3CopyArc components to

    Now you can use ant like paths to exclude hidden files etc, in s3 buckets, like _SUCCESS with an exclude on **/_*

    docs.clusterless.io/reference/

    docs.clusterless.io/reference/

  21. How's this for a tag line?

    Think + without the airflow, but with a lot more trust and agility.

  22. I've started publishing how-tos on using to manage pipelines.

    A little overkill, but the first is how to manage an s3 bucket.

    docs.clusterless.io/guide/1.0-

  23. Having a scenario runner for flows in is pretty cool for automated testing of dags of workloads.

    especially if they are part of your ci/cd

    github.com/ClusterlessHQ/clust

    on every commit, a suite of scenarios are deployed, run, and destroyed against

    github.com/ClusterlessHQ/clust

  24. would be great to have time (be paid) to create a sample app on to put data behind a WASM frontend

    duckdb.org/2021/10/29/duckdb-w

    So instead of Athena/Glue integration that can work against a complete corpus, have DuckDb over the last 30 days for investigations etc

    github.com/ClusterlessHQ/aws-s

    quick reminder, chris.wensel.net

    I want to do this with datasette.io as well (I did it in a previous role and it was awesome)

  25. Just pushed a new Tessellate release that updates the transform statement syntax to include intrinsic functions.

    The first function is tsid, a unique long value generated by the github.com/f4b6a3/tsid-creator library.

    More here: github.com/ClusterlessHQ/tesse

    github.com/ClusterlessHQ/tesse

  26. @Cmastication depends on how you access it? If via a query, only partition on the most common predicates.

    Repartitioning data for different access patterns is a key use case behind and tessellate. See bio for links.

    Otherwise yeah, partition via hash to get equal sized bits. Reminds me to add a hash transform to tessellate.

  27. I'll make a bigger announcement later, but if you are following along with development, note that we just added Glue/Athena support.

    That means databases and tables can be deployed in tandem with a workload, and any new partitions that arrive will be added to the table.

    github.com/ClusterlessHQ/clust

    This example has been updated to show how it works and how simple it is (relatively).

    github.com/ClusterlessHQ/aws-s

    So imagine, every result dataset in the dag having a table to query.

  28. just sayin', if you find chaining sql statements into a data processing dag a bit of a drag, I suggest you spend some time with

    github.com/ClusterlessHQ

    declarative decentralized heterogeneous flows (in today)