Search
103 results for “cwensel”
-
@cwensel yep, and going back to your toot on open source sustainability — #b2evolution with a great project and a wonderful community shuttered after 18 years
Community and adoption and even commercial backers don’t always add up to forever sustainability
-
So Tessellate inherits lots of support for various data formats from Cascading
https://github.com/cwensel/cascadingEven though #apacheparquet dropped Cascading support, we were able to port it over.
Now that Parquet is native to Cascading, it should be easier to add #apacheiceberg support.
This would allow #clusterless to convert data as it arrives into Iceberg continuously for use in #aws Athena or other data front-ends.
Anyone interested in a challenge?
-
A little more color on this announcement..
https://fosstodon.org/@cwensel/110549001614086663First, #ApacheParquet removed #Cascading support, so I had to splice the original source into Cascading. But the ParquetScheme didn't honor type information fully. So there is a new TypedParquetScheme that has native support for JSON and Timestamps.
Second, Parquet requires the #ApacheHadoop FileSystem, which means we get the wonderful S3A implementation. But we also get a 331MB jar dependency with the aws bundle.
-
Many thanks to @czds @RuthMalan @kcarruthers @EmilyK @cwensel @douglasvb @digikata @deborahh @Sevoris for the boosts and favoriting of this thread for #IoTday #IoTday2024
https://mastodon.social/@jadp/112242993601272319 -
Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.
#Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.
-
Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.
#Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.
-
Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.
#Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.
-
Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.
#Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.
-
Hey all, I'm hiring for a #DataEng (and #DevOps) role in the SF Bay Area. Reach out directly to me for more info.
#Java background is a strong want plus AWS and a desire to work on #OpenSource stuff.
-
so my #Grafana #k6 cloud runs seem to initialize the SharedArray object multiple times and is passing different instances to the remote processes. Inits once locally, and historically I don't remember this being an issue in the cloud.
https://grafana.com/docs/k6/latest/javascript-api/k6-data/sharedarray/
I have support and slack questions open, but I find it odd if i'm the only person experiencing this.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
Has no one ever read Hyperion/Endymion?
Fun project "Last digital common ancestor"
Self-replicating, self-modifying Assembly program that can evolve into every possible computer program in the universe.
https://github.com/mertyildiran/ldca
#Assembly #OpenSource #github #fasm -
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.