#clusterless — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #clusterless, aggregated by home.social.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
updated #clusterless subpop cli build to provide a Homebrew tap for easy installation.
https://github.com/ClusterlessHQ/subpop
subpop is an experimental tool for diffing datasets from the cli.
runs on #Linux and #macOS but sadly written in #java so no native binaries just yet.
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
while pondering my need for a remote compute environment, vs having random boxes littered about generating heat, I realized I could add a 'device' component concept to #Clusterless.
this concept not only compliments the current model types, it will be handy standalone.
consider an #AWS ec2/ecs instance doing some complex work and dropping files into S3 (over the new mount point feature) where a clusterless DAG takes over processing when the files arrive (via the S3 put boundary).
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
On that note, not using #clusterless for #dataengineering is a trap
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
@seldo I don't know any firsthand,
but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project
https://github.com/ClusterlessHQ
unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.
still looking for a fun RAG based prototype I could build and share.
-
need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines
replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.
-
need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines
replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.
-
need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines
replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.
-
need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines
replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.
-
need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines
replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.
-
currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.
CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)
-
currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.
CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)
-
currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.
CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)
-
currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.
CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)
-
currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.
CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)
-
Tessellate is now on Docker Hub
https://hub.docker.com/r/clusterless/tessellate
Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.
-
Tessellate is now on Docker Hub
https://hub.docker.com/r/clusterless/tessellate
Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.
-
Tessellate is now on Docker Hub
https://hub.docker.com/r/clusterless/tessellate
Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.
-
Tessellate is now on Docker Hub
https://hub.docker.com/r/clusterless/tessellate
Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.
-
Tessellate is now on Docker Hub
https://hub.docker.com/r/clusterless/tessellate
Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.
-
Automating #AWS CloudWatch log export into S3 is no simple task.
Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..
The first Activity will be function that exports cloud watch logs created within the previous interval.
As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate
The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!
-
Automating #AWS CloudWatch log export into S3 is no simple task.
Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..
The first Activity will be function that exports cloud watch logs created within the previous interval.
As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate
The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!
-
Automating #AWS CloudWatch log export into S3 is no simple task.
Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..
The first Activity will be function that exports cloud watch logs created within the previous interval.
As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate
The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!
-
Automating #AWS CloudWatch log export into S3 is no simple task.
Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..
The first Activity will be function that exports cloud watch logs created within the previous interval.
As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate
The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!
-
Automating #AWS CloudWatch log export into S3 is no simple task.
Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..
The first Activity will be function that exports cloud watch logs created within the previous interval.
As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate
The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!
-
ok, here's a new one for #aws users.
would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.
and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?
the challenge for users is getting the `detail` json field exposed since it's app specific.
with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline
-
ok, here's a new one for #aws users.
would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.
and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?
the challenge for users is getting the `detail` json field exposed since it's app specific.
with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline
-
ok, here's a new one for #aws users.
would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.
and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?
the challenge for users is getting the `detail` json field exposed since it's app specific.
with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline
-
ok, here's a new one for #aws users.
would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.
and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?
the challenge for users is getting the `detail` json field exposed since it's app specific.
with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline
-
ok, here's a new one for #aws users.
would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.
and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?
the challenge for users is getting the `detail` json field exposed since it's app specific.
with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline
-
I finally wrote up some documentation on using #clusterless Tessellate for #data stuff
-
I finally wrote up some documentation on using #clusterless Tessellate for #data stuff
-
I finally wrote up some documentation on using #clusterless Tessellate for #data stuff
-
I finally wrote up some documentation on using #clusterless Tessellate for #data stuff
-
I finally wrote up some documentation on using #clusterless Tessellate for #data stuff