home.social

#datatools β€” Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datatools, aggregated by home.social.

  1. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  2. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  3. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  4. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  5. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

  6. Diving deep into Spark batch processing!⚑️

    Learned how to:
    βœ… Optimize data pipelines with filtering, repartitioning & grouping
    βœ… Design efficient ETL pipelines with Spark
    βœ… Understanding when and how to use partitioning strategies
    βœ… Use Google Cloud Storage (GCS) as a data source for Spark applications and configuring Spark to read Parquet or other formats from GCS
    βœ… Visualize execution plans for efficient coding
    βœ… Review the Spark UI for performance monitoring

    πŸ’‘ Key takeaway: One thing that amazes me about distributed computing is how we've transformed from struggling with massive datasets to generating insights in near real-time. As an analyst who has dealt with long wait times in processing data, spark saves so much time in getting results faster and make data-driven decisions more quickly.

    Review my work here: github.com/ammartin8/data_engi

  7. Just completed a project building an end-to-end data pipeline for NYC taxi data using dlt πŸš•πŸ“Š! What a ride! πŸ˜… The REST API extraction was particularly fun (in a challenging way) but dlt's modular design made it manageable. Here’s what I learned:

    βœ… Full life cycle: From REST API extraction to DuckDB loading, all in one framework
    βœ… Reproducibility: Tracked every transformation with dlt's lineage features
    βœ… Modular design: Defined reusable components for extracting and normalizing data
    βœ… Handles complexity: Seamlessly handled pagination from the API
    Big takeaway: dlt isn't just tooling, it's a framework for thinking about data pipelines that emphasizes transparency and reproducibility which is essential for any modern data stack

  8. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  9. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  10. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  11. Discover RQLBrowser from Logilab β€” a sleek way to explore and test RQL queries! Great demo for devs, data folks, and curious tinkerers. Check out the UI, examples, and tips to speed up your data workflow. #RQL #RQLBrowser #Logilab #DataTools #DevTools #FOSS #PeerTube #English
    peertube.logilab.fr/videos/wat

  12. Building a Simple Tech Stack: Avoid Costly Mistakes and Boost Efficiency! #data #datatools

    How to should you choose be best data tools to make an efficient stack? Keep it simple! source

    quadexcel.com/wp/building-a-si

  13. Mα»™t cΓ΄ng cα»₯ phΓ’n tΓ­ch dα»― liệu thời gian miα»…n phΓ­ mα»›i Δ‘Γ£ ra mαΊ―t! **Prophetize** cho phΓ©p dα»± bΓ‘o nhanh bαΊ±ng thuαΊ­t toΓ‘n Holt-Winters trα»±c tiαΊΏp trΓͺn trΓ¬nh duyệt, khΓ΄ng cαΊ§n tαΊ£i lΓͺn Δ‘Γ‘m mΓ’y. Hα»— trợ phΓ‘t hiện mΓΉa vα»₯ tα»± Δ‘α»™ng, xα»­ lΓ½ ngΓ y thΓ‘ng phα»©c tαΊ‘p vΓ  xuαΊ₯t kαΊΏt quαΊ£ sang Excel. Dα»… dΓ ng, khΓ΄ng cαΊ§n Δ‘Δƒng nhαΊ­p. #Technology #TimeSeries #DataTools #PhαΊ§nMềmMiα»…nPhi #PhΓ’nTΓ­chDα»―Liệu

    reddit.com/r/SideProject/comme

  14. πŸͺΏ When AI gets confused by geese - call DotDotGoose!

    It’s a super cute open-source tool for counting things on images by hand.
    Just click around and mark birds, bugs, butterflies - whatever you need πŸ₯πŸ¦‹

    Made by the American Museum of Natural History for real science -
    but honestly, it’s so cozy I just wanna sit and count little geese all day 🀍

    πŸ”— github.com/persts/DotDotGoose

    #GooseGang #Python #OpenSource #DevGirl #DataTools

  15. Need better insights from your data?
    Talk to @eazyBI β€” they're at #BalticRuby in red shirts and ready to help! πŸ“Š
    Thanks to our Bronze Sponsor for supporting the Ruby community with tools and good vibes! eazybi.com/

    #eazyBI #RubyOnRails #RubyConference #DataTools

  16. Democratizing AI access is our mission! That's why we've integrated Ollama with our (Un)Perplexed Spready software. See our comparison of which modelβ€”Mistral 7B or OLMo2 7Bβ€”might work best for your use case. matasoft.hr/qtrendcontrol/inde #AIForAll #Ollama #DataTools

  17. 🐀 Wow, just what we neededβ€”a "lightweight" data framework that’s built on top of #DuckDB πŸ¦† and #3FS πŸ”§, because clearly, our data didn’t have enough buzzword sauces on it already. πŸ› But hey, at least now we have another tool to sit idly in our #GitHub repos, collecting digital dust while we pretend it’s the magic bullet for all data woes. 🎯✨
    github.com/deepseek-ai/smallpo #lightweightdata #datatools #HackerNews #ngated

  18. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  19. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  20. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  21. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  22. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  23. I am so happy with the first own web application πŸŽ‰ I have developed: Tris, a simple and free web crawler πŸ•ΈοΈ πŸ•·οΈ !

    You can try it for free online: tris.fly.dev, limited to 3 parallel crawls and 100 links of path depth of 3.

    Next thing I will add will be a text input to set a target domain hhh, now I am making it hard! πŸ™ˆ

    #node #nodejs #web #webcrawler #crawler #seo #datatools #webscraper #scraping #seotools #seotool #tris #triswebcrawler #webapp #indie #indiedev

  24. Our #DataPreview 🈸 extension recently crossed 500,000 installs in Visual Studio Code Marketplace.

    We plan to release Data Preview Pro version with large data files support, new Prospective Data Viewer and better support of Arrow and Parquet data files next year for our Pro Sponsors on GitHub.

    Meanwhile, try our older open source version from GitHub or VS Code Marketplace:

    πŸ“° github.com/RandomFractals/vsco

    πŸ“₯ marketplace.visualstudio.com/i

    #VSCode #DataTools πŸ› οΈ / #ProDataTools πŸ§™ ...

  25. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  26. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  27. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  28. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  29. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  30. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  31. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  32. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  33. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  34. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  35. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  36. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  37. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  38. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  39. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  40. If you are still looking for some good simple #DataViewers to use in #VSCode IDE, try our #RandomFractalsInc viewers and #DataTools πŸ› οΈ from the top row of tiles and extensions in VS Code marketplace search results for #DataViewer:

    πŸ“° marketplace.visualstudio.com/s

    Use them for local and remote data preview 🈸, maps πŸ—ΊοΈ, graphs πŸ“ˆπŸ“Š, tabular data display of flat data files and dataframes in Jupyter notebooks πŸ“š.

    #ProDataTools πŸ§™β€β™‚οΈ ...

  41. If you are still looking for some good simple #DataViewers to use in #VSCode IDE, try our #RandomFractalsInc viewers and #DataTools πŸ› οΈ from the top row of tiles and extensions in VS Code marketplace search results for #DataViewer:

    πŸ“° marketplace.visualstudio.com/s

    Use them for local and remote data preview 🈸, maps πŸ—ΊοΈ, graphs πŸ“ˆπŸ“Š, tabular data display of flat data files and dataframes in Jupyter notebooks πŸ“š.

    #ProDataTools πŸ§™β€β™‚οΈ ...