home.social

Search

103 results for “cwensel”

  1. @cwensel yep, and going back to your toot on open source sustainability — #b2evolution with a great project and a wonderful community shuttered after 18 years

    Community and adoption and even commercial backers don’t always add up to forever sustainability

  2. So Tessellate inherits lots of support for various data formats from Cascading
    github.com/cwensel/cascading

    Even though dropped Cascading support, we were able to port it over.

    Now that Parquet is native to Cascading, it should be easier to add support.

    This would allow to convert data as it arrives into Iceberg continuously for use in Athena or other data front-ends.

    Anyone interested in a challenge?

  3. A little more color on this announcement..
    fosstodon.org/@cwensel/1105490

    First, removed support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.

    Second, Parquet requires the FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.

  4. Hey all, I'm hiring for a (and ) role in the SF Bay Area. Reach out directly to me for more info.

    background is a strong want plus AWS and a desire to work on stuff.

  5. Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.

    #Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.

  6. Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.

    #Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.

  7. Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.

    #Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.

  8. Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.

    #Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.

  9. so my cloud runs seem to initialize the SharedArray object multiple times and is passing different instances to the remote processes. Inits once locally, and historically I don't remember this being an issue in the cloud.

    grafana.com/docs/k6/latest/jav

    I have support and slack questions open, but I find it odd if i'm the only person experiencing this.

  10. updated subpop cli build to provide a Homebrew tap for easy installation.

    github.com/ClusterlessHQ/subpop

    subpop is an experimental tool for diffing datasets from the cli.

    runs on and but sadly written in so no native binaries just yet.

  11. updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.

    github.com/ClusterlessHQ/subpo

    subpop is an experimental tool for diffing datasets from the cli.

    runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.

  12. updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.

    github.com/ClusterlessHQ/subpo

    subpop is an experimental tool for diffing datasets from the cli.

    runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.

  13. updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.

    github.com/ClusterlessHQ/subpo

    subpop is an experimental tool for diffing datasets from the cli.

    runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.

  14. updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.

    github.com/ClusterlessHQ/subpo

    subpop is an experimental tool for diffing datasets from the cli.

    runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.

  15. Has no one ever read Hyperion/Endymion?

    Fun project "Last digital common ancestor"
    Self-replicating, self-modifying Assembly program that can evolve into every possible computer program in the universe.
    github.com/mertyildiran/ldca

    infosec.exchange/@ankit_anubha

  16. while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to .

    this concept not only compliments the current model types, it will be handy standalone.

    consider an ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).

  17. while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.

    this concept not only compliments the current model types, it will be handy standalone.

    consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).

  18. while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.

    this concept not only compliments the current model types, it will be handy standalone.

    consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).

  19. while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.

    this concept not only compliments the current model types, it will be handy standalone.

    consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).

  20. while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.

    this concept not only compliments the current model types, it will be handy standalone.

    consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).

  21. @seldo I don't know any firsthand,

    but I spent the last couple weeks exploring what a pipeline would look like so I could write a sample application/pipeline using my project

    github.com/ClusterlessHQ

    unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.

    still looking for a fun RAG based prototype I could build and share.

  22. @seldo I don't know any firsthand,

    but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project

    github.com/ClusterlessHQ

    unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.

    still looking for a fun RAG based prototype I could build and share.

  23. @seldo I don't know any firsthand,

    but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project

    github.com/ClusterlessHQ

    unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.

    still looking for a fun RAG based prototype I could build and share.

  24. @seldo I don't know any firsthand,

    but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project

    github.com/ClusterlessHQ

    unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.

    still looking for a fun RAG based prototype I could build and share.