“cwensel” — Fediverse search results on home.social

Chris K Wensel @[email protected] · 2024-01-02 · 20:18 UTC

@seldo I don't know any firsthand,

but I spent the last couple weeks exploring what a #RAG pipeline would look like so I could write a sample application/pipeline using my #OpenSource #clusterless project

https://github.com/ClusterlessHQ

unfortunately the idea I had wasn't ultimately suitable for RAG and could be a simple BERT/BART summarizer pipeline without having a open/elasticsearch backend or other vector db.

still looking for a fun RAG based prototype I could build and share.

#rag #opensource #clusterless

Chris K Wensel @cwensel · 2023-12-18 · 17:32 UTC

need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines

https://aws.amazon.com/blogs/big-data/build-efficient-etl-pipelines-with-aws-step-functions-distributed-map-and-redrive-feature/

replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.

https://github.com/ClusterlessHQ

#aws #data #clusterless

Chris K Wensel @[email protected] · 2023-12-18 · 17:32 UTC

need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines

https://aws.amazon.com/blogs/big-data/build-efficient-etl-pipelines-with-aws-step-functions-distributed-map-and-redrive-feature/

replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.

https://github.com/ClusterlessHQ

#aws #data #clusterless

Chris K Wensel @[email protected] · 2023-12-18 · 17:32 UTC

need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines

https://aws.amazon.com/blogs/big-data/build-efficient-etl-pipelines-with-aws-step-functions-distributed-map-and-redrive-feature/

replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.

https://github.com/ClusterlessHQ

#aws #data #clusterless

Chris K Wensel @[email protected] · 2023-12-18 · 17:32 UTC

need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines

https://aws.amazon.com/blogs/big-data/build-efficient-etl-pipelines-with-aws-step-functions-distributed-map-and-redrive-feature/

replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.

https://github.com/ClusterlessHQ

#clusterless #data #aws

Chris K Wensel @[email protected] · 2023-12-18 · 17:32 UTC

need to dig into this, but i've been doing replay (redrive) on #aws StepFunctions for years with my #data pipelines

https://aws.amazon.com/blogs/big-data/build-efficient-etl-pipelines-with-aws-step-functions-distributed-map-and-redrive-feature/

replay is one feature I haven't added back to #Clusterless yet, though all the metadata is there.

https://github.com/ClusterlessHQ

#aws #data #clusterless

Chris K Wensel @cwensel · 2023-12-13 · 19:11 UTC

@c_chep

currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.

CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)

#clusterless

Chris K Wensel @[email protected] · 2023-12-13 · 19:11 UTC

@c_chep

currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.

CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)

#clusterless

Chris K Wensel @[email protected] · 2023-12-13 · 19:11 UTC

@c_chep

currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.

CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)

#clusterless

Chris K Wensel @[email protected] · 2023-12-13 · 19:11 UTC

@c_chep

currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.

CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)

#clusterless

Chris K Wensel @[email protected] · 2023-12-13 · 19:11 UTC

@c_chep

currently all my #clusterless examples (and scenario tester) use jsonnet, but it's got weak overall support.

CUE looks interesting, but no Java implementation for embedding (if that was a thing I was considering)

#clusterless

Chris K Wensel @cwensel · 2023-12-06 · 17:31 UTC

Tessellate is now on Docker Hub

https://hub.docker.com/r/clusterless/tessellate

Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.

#clusterless

#data #clusterless

Chris K Wensel @[email protected] · 2023-12-06 · 17:31 UTC

Tessellate is now on Docker Hub

https://hub.docker.com/r/clusterless/tessellate

Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.

#clusterless

#data #clusterless

Chris K Wensel @[email protected] · 2023-12-06 · 17:31 UTC

Tessellate is now on Docker Hub

https://hub.docker.com/r/clusterless/tessellate

Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.

#clusterless

#data #clusterless

Chris K Wensel @[email protected] · 2023-12-06 · 17:31 UTC

Tessellate is now on Docker Hub

https://hub.docker.com/r/clusterless/tessellate

Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.

#clusterless

#clusterless #data

Chris K Wensel @[email protected] · 2023-12-06 · 17:31 UTC

Tessellate is now on Docker Hub

https://hub.docker.com/r/clusterless/tessellate

Tessellate is a command line tool for reading and writing #data to/from multiple locations and across multiple formats.

#clusterless

#data #clusterless

Chris K Wensel @cwensel · 2023-12-05 · 00:43 UTC

Automating #AWS CloudWatch log export into S3 is no simple task.

Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..

The first Activity will be function that exports cloud watch logs created within the previous interval.

As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate

The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!

#aws #clusterless #tessellate

Chris K Wensel @[email protected] · 2023-12-05 · 00:43 UTC

Automating #AWS CloudWatch log export into S3 is no simple task.

Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..

The first Activity will be function that exports cloud watch logs created within the previous interval.

As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate

The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!

#aws #clusterless #tessellate

Chris K Wensel @[email protected] · 2023-12-05 · 00:43 UTC

Automating #AWS CloudWatch log export into S3 is no simple task.

Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..

The first Activity will be function that exports cloud watch logs created within the previous interval.

As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate

The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!

#aws #clusterless #tessellate

Chris K Wensel @[email protected] · 2023-12-05 · 00:43 UTC

Automating #AWS CloudWatch log export into S3 is no simple task.

Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..

The first Activity will be function that exports cloud watch logs created within the previous interval.

As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate

The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!

#tessellate #clusterless #aws

Chris K Wensel @[email protected] · 2023-12-05 · 00:43 UTC

Automating #AWS CloudWatch log export into S3 is no simple task.

Next #clusterless release will now have a new Component type called Activity that is simply a scheduled task..

The first Activity will be function that exports cloud watch logs created within the previous interval.

As they arrive, any arc can subscribe to the data drop and do things. To simplify that task, I'll update #tessellate

The cw log is a delimited text file with two columns, one is json. unlike all the others in aws!

#aws #clusterless #tessellate

Chris K Wensel @cwensel · 2023-11-29 · 23:10 UTC

ok, here's a new one for #aws users.

would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.

and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?

the challenge for users is getting the `detail` json field exposed since it's app specific.

with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline

#aws #parquet #clusterless #data

Chris K Wensel @[email protected] · 2023-11-29 · 23:10 UTC

ok, here's a new one for #aws users.

would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.

and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?

the challenge for users is getting the `detail` json field exposed since it's app specific.

with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline

#aws #parquet #clusterless #data

Chris K Wensel @[email protected] · 2023-11-29 · 23:10 UTC

ok, here's a new one for #aws users.

would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.

and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?

the challenge for users is getting the `detail` json field exposed since it's app specific.

with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline

#aws #parquet #clusterless #data

Chris K Wensel @[email protected] · 2023-11-29 · 23:10 UTC

ok, here's a new one for #aws users.

would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.

and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?

the challenge for users is getting the `detail` json field exposed since it's app specific.

with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline

#data #clusterless #parquet #aws

Chris K Wensel @[email protected] · 2023-11-29 · 23:10 UTC

ok, here's a new one for #aws users.

would anyone be interested in an automated way to extract CloudWatch logs (continuously) into an s3 bucket.

and have them converted into #parquet (/etc) for downstream custom processing. or simply partitioned with partition updates to AWS Athena/Glue?

the challenge for users is getting the `detail` json field exposed since it's app specific.

with #clusterless devs could then inject custom processing for custom app logs into the #data pipeline

#aws #parquet #clusterless #data

Chris K Wensel @cwensel · 2023-11-17 · 17:40 UTC

I finally wrote up some documentation on using #clusterless Tessellate for #data stuff

https://docs.clusterless.io/tessellate/1.0-wip/index.html

#clusterless #data

Chris K Wensel @[email protected] · 2023-11-17 · 17:40 UTC

I finally wrote up some documentation on using #clusterless Tessellate for #data stuff

https://docs.clusterless.io/tessellate/1.0-wip/index.html

#clusterless #data

Chris K Wensel @[email protected] · 2023-11-17 · 17:40 UTC

I finally wrote up some documentation on using #clusterless Tessellate for #data stuff

https://docs.clusterless.io/tessellate/1.0-wip/index.html

#clusterless #data

Chris K Wensel @[email protected] · 2023-11-17 · 17:40 UTC

I finally wrote up some documentation on using #clusterless Tessellate for #data stuff

https://docs.clusterless.io/tessellate/1.0-wip/index.html

#data #clusterless

Search