home.social

#datatools β€” Public Fediverse posts

Live and recent posts from across the Fediverse tagged #datatools, aggregated by home.social.

  1. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  2. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  3. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  4. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

    #mastodon #fediverse #data #spark #dataengineering #ai #technology #datatools #datapipelines #fedihire #thursday #sql #observability #etl #python #github

  5. πŸŽ‰ Milestone Unlocked: Finished the Data Engineering Zoomcamp!

    In 10 weeks, I moved from scripting to architecting systems. We built real production-grade infrastructure using Spark, Kafka, Airflow, and Kestraβ€”not just hobby projects.

    Capstone: A Storage Hard Drive Dashboard using real failure data from Backblaze
    Stack: Terraform + Docker infra, Airflow orchestration, dbt modeling, Streamlit viz.

    Key Lessons:
    βœ…οΈ "It works on my laptop" isn't a strategy.
    βœ… Need IaC, partitioning, clustering, and strict error handling.
    βœ… dbt ensures reproducible, tested models.
    βœ… Infra is invisible workβ€”if it breaks, your code fails.

    Take the leap! It’s challenging but by week 10, pieces click into place. Seeing my pipeline run autonomously felt like crossing the finish line. 🏁

    Thanks Data Talks Club team! On to the next challenge!

    My project: github.com/ammartin8/hard_driv

  6. Diving deep into Spark batch processing!⚑️

    Learned how to:
    βœ… Optimize data pipelines with filtering, repartitioning & grouping
    βœ… Design efficient ETL pipelines with Spark
    βœ… Understanding when and how to use partitioning strategies
    βœ… Use Google Cloud Storage (GCS) as a data source for Spark applications and configuring Spark to read Parquet or other formats from GCS
    βœ… Visualize execution plans for efficient coding
    βœ… Review the Spark UI for performance monitoring

    πŸ’‘ Key takeaway: One thing that amazes me about distributed computing is how we've transformed from struggling with massive datasets to generating insights in near real-time. As an analyst who has dealt with long wait times in processing data, spark saves so much time in getting results faster and make data-driven decisions more quickly.

    Review my work here: github.com/ammartin8/data_engi

  7. Just completed a project building an end-to-end data pipeline for NYC taxi data using dlt πŸš•πŸ“Š! What a ride! πŸ˜… The REST API extraction was particularly fun (in a challenging way) but dlt's modular design made it manageable. Here’s what I learned:

    βœ… Full life cycle: From REST API extraction to DuckDB loading, all in one framework
    βœ… Reproducibility: Tracked every transformation with dlt's lineage features
    βœ… Modular design: Defined reusable components for extracting and normalizing data
    βœ… Handles complexity: Seamlessly handled pagination from the API
    Big takeaway: dlt isn't just tooling, it's a framework for thinking about data pipelines that emphasizes transparency and reproducibility which is essential for any modern data stack

  8. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  9. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  10. ICYMI: 3 days left: Amazon is locking the door on the data tools sellers depended on: With Amazon's BSA Agent Policy taking effect in 3 days, an industry expert argues the review cap and new rules are closing the third-party AI data pipeline for good. ppc.land/3-days-left-amazon-is #Amazon #DataTools #BSAAgentPolicy #ThirdParty #AIData

  11. Discover RQLBrowser from Logilab β€” a sleek way to explore and test RQL queries! Great demo for devs, data folks, and curious tinkerers. Check out the UI, examples, and tips to speed up your data workflow. #RQL #RQLBrowser #Logilab #DataTools #DevTools #FOSS #PeerTube #English
    peertube.logilab.fr/videos/wat

  12. Building a Simple Tech Stack: Avoid Costly Mistakes and Boost Efficiency! #data #datatools

    How to should you choose be best data tools to make an efficient stack? Keep it simple! source

    quadexcel.com/wp/building-a-si

  13. Mα»™t cΓ΄ng cα»₯ phΓ’n tΓ­ch dα»― liệu thời gian miα»…n phΓ­ mα»›i Δ‘Γ£ ra mαΊ―t! **Prophetize** cho phΓ©p dα»± bΓ‘o nhanh bαΊ±ng thuαΊ­t toΓ‘n Holt-Winters trα»±c tiαΊΏp trΓͺn trΓ¬nh duyệt, khΓ΄ng cαΊ§n tαΊ£i lΓͺn Δ‘Γ‘m mΓ’y. Hα»— trợ phΓ‘t hiện mΓΉa vα»₯ tα»± Δ‘α»™ng, xα»­ lΓ½ ngΓ y thΓ‘ng phα»©c tαΊ‘p vΓ  xuαΊ₯t kαΊΏt quαΊ£ sang Excel. Dα»… dΓ ng, khΓ΄ng cαΊ§n Δ‘Δƒng nhαΊ­p. #Technology #TimeSeries #DataTools #PhαΊ§nMềmMiα»…nPhi #PhΓ’nTΓ­chDα»―Liệu

    reddit.com/r/SideProject/comme

  14. πŸͺΏ When AI gets confused by geese - call DotDotGoose!

    It’s a super cute open-source tool for counting things on images by hand.
    Just click around and mark birds, bugs, butterflies - whatever you need πŸ₯πŸ¦‹

    Made by the American Museum of Natural History for real science -
    but honestly, it’s so cozy I just wanna sit and count little geese all day 🀍

    πŸ”— github.com/persts/DotDotGoose

    #GooseGang #Python #OpenSource #DevGirl #DataTools

  15. Need better insights from your data?
    Talk to @eazyBI β€” they're at #BalticRuby in red shirts and ready to help! πŸ“Š
    Thanks to our Bronze Sponsor for supporting the Ruby community with tools and good vibes! eazybi.com/

    #eazyBI #RubyOnRails #RubyConference #DataTools

  16. Democratizing AI access is our mission! That's why we've integrated Ollama with our (Un)Perplexed Spready software. See our comparison of which modelβ€”Mistral 7B or OLMo2 7Bβ€”might work best for your use case. matasoft.hr/qtrendcontrol/inde #AIForAll #Ollama #DataTools

  17. I am so happy with the first own web application πŸŽ‰ I have developed: Tris, a simple and free web crawler πŸ•ΈοΈ πŸ•·οΈ !

    You can try it for free online: tris.fly.dev, limited to 3 parallel crawls and 100 links of path depth of 3.

    Next thing I will add will be a text input to set a target domain hhh, now I am making it hard! πŸ™ˆ

    #node #nodejs #web #webcrawler #crawler #seo #datatools #webscraper #scraping #seotools #seotool #tris #triswebcrawler #webapp #indie #indiedev

  18. Our #DataTableRenderers 🈸 / 🈷️ for #VSCode Notebooks πŸ“š recently crossed 67K installs and 100K downloads. Thanks to all the devs and data scientists using them to view tabular data outputs and data summaries in Jupyter notebooks! πŸ€—

    πŸ“₯ marketplace.visualstudio.com/i

    Sponsor our #DataTools πŸ› οΈ work on GitHub, and sign up for our Pro sponsor tier to get notified about new #DataNotebook πŸ““ Pro Tools coming to VS Code IDE soon:

    ✍️ github.com/RandomFractals/pro-

    #DataTable δΈ­ / #DataNotebooks πŸ“š / #ProDataTools πŸ§™ ...

  19. Running #SQL queries in our new #DataNotebook πŸ““ @code extension, rendering results with simple #DataTable, #DataSummary 🈷️ & #FlatDataGrid δΈ­ from our #DataTableRenderers 🈸 + CSV #DataExport from VSCode Notebook cell output all in one go. We doubt #MalloyData is as flexible. 😎 #DataTools πŸ”¬ ...

  20. Our #DataTableRenderers 🈸 for #VSCode Notebooks πŸ“š has over 30,000 installs. It's one of the most widely used #dataNotebook πŸ““ extensions in VS marketplace. Extension includes scrollable #dataTable, #flatDataGrid & #dataSummary output renderers. Try it!
    πŸ“₯ marketplace.visualstudio.com/i

    #dataTools πŸ› οΈ πŸ’ŽπŸ’ŽπŸ’Ž...

  21. 🐀 Wow, just what we neededβ€”a "lightweight" data framework that’s built on top of #DuckDB πŸ¦† and #3FS πŸ”§, because clearly, our data didn’t have enough buzzword sauces on it already. πŸ› But hey, at least now we have another tool to sit idly in our #GitHub repos, collecting digital dust while we pretend it’s the magic bullet for all data woes. 🎯✨
    github.com/deepseek-ai/smallpo #lightweightdata #datatools #HackerNews #ngated

  22. πŸ“£ Published new free and public #DuckDB #SQLTools Preview v1.6.0 for VS Code IDE with DuckDB v0.10.2 support.

    πŸ“₯ marketplace.visualstudio.com/i

    See our #ProDataTools repo for advanced #DuckDBPro schema, commands, and new features updates:

    πŸ“° github.com/RandomFractals/pro-

    #VSCode #DataTools πŸ› οΈ πŸ§™β€β™‚οΈ ...

  23. Our #DataPreview 🈸 extension recently crossed 500,000 installs in Visual Studio Code Marketplace.

    We plan to release Data Preview Pro version with large data files support, new Prospective Data Viewer and better support of Arrow and Parquet data files next year for our Pro Sponsors on GitHub.

    Meanwhile, try our older open source version from GitHub or VS Code Marketplace:

    πŸ“° github.com/RandomFractals/vsco

    πŸ“₯ marketplace.visualstudio.com/i

    #VSCode #DataTools πŸ› οΈ / #ProDataTools πŸ§™ ...

  24. Started working on new #PGSqlTools plugin this month.

    Similar to our #DuckDBPro and #SQLitePro #DataTools πŸ› οΈ this extension will display all #Postgres DB objects in Connections Explorer, provide information schema, system catalogs and other standard PG view shortcut commands for Postgres, Cockroach and many other #pgwire compatible databases.

    Aiming to support most of the popular PostgreSQL flavor databases from this list: wiki.postgresql.org/wiki/Postg

    #VSCode #PostgreSQL #SQLTools #ProDataTools πŸ§™

  25. #DuckDB devs & users, we just hit 5K installs & over 7.6K downloads of our free & public #DuckDBSQLTools extension for #VSCode IDE.

    DuckDB Labs & co. should just buy that extension to get more traction in the #DataTools field with #SQLTools devs, data scientists, data analysts, and other users using & exploring their DB in Code πŸ˜‰

    πŸ“₯ marketplace.visualstudio.com/i

    #DuckDBTools πŸ› οΈ #ProDataTools πŸ§™β€β™€οΈ ...

  26. If you are still looking for some good simple #DataViewers to use in #VSCode IDE, try our #RandomFractalsInc viewers and #DataTools πŸ› οΈ from the top row of tiles and extensions in VS Code marketplace search results for #DataViewer:

    πŸ“° marketplace.visualstudio.com/s

    Use them for local and remote data preview 🈸, maps πŸ—ΊοΈ, graphs πŸ“ˆπŸ“Š, tabular data display of flat data files and dataframes in Jupyter notebooks πŸ“š.

    #ProDataTools πŸ§™β€β™‚οΈ ...

  27. For all the brave devs & #devCrews into building nifty #dataTools πŸ› οΈ out there πŸ˜‰ ...

    Our #RandomFractalsInc works on @github circa October 2023 πŸ€—

    πŸ’– github.com/RandomFractals

    #RandomFractals #dataTools πŸ§™β€β™‚οΈ ...

  28. We also wonder why 🀫. Maybe @OpenAtMicrosoft should open source their #DataWrangler if they want to compete with our dated OSS #dataTools πŸ› οΈ in #VSCode and other 3rd party extensions that outrank them πŸ€—

    🧐 github.com/microsoft/vscode-da

  29. A few remaining #ObservableJS πŸ““ to #Quarto doc conversion & docs tickets in our old Observable #DataTools πŸ› οΈ repo to guide data analysts and data scientists exploring that space and tools ecosystem on how to use them in #VSCode IDE ...

    πŸ“° github.com/RandomFractals/obse

    #DataNotebooks πŸ“š

  30. I hosted the PipeRider Community Office Hours yesterday and one cool feature we demoed was this online dbt manifest viewer.

    Generate a Lineage Diff from just your dbt project's manifest files.

    Try it out for yourself here:

    staging.cloud.piperider.io/onl

    Watch it in action on Youtube:

    youtube.com/watch?v=LHuTb3e_4O

    #DataEngineering #DataViz #dbt #DataTools #AnalyticsEngineering #OpenSource #PipeRider

  31. Users of #VSCodium IDE can also use our new #DuckDBSqlTools extension.

    1⃣ Download duckdb-sql-tools-x.x.x.vsix πŸ“¦ from our extension releases on github:

    πŸ“₯ github.com/RandomFractals/duck

    2⃣ Use Install from #VSIX feature to install #DuckDB #SqlTools πŸ“¦πŸ‘‡

    ▢️ twitter.com/TarasNovak/status/

    #SQL πŸ“œ #DataTools πŸ§™β€β™‚οΈ πŸ”¬ πŸ’Ž ...

  32. Quick example of running #SQL query in our new #DataNotebook πŸ““ code extension we plan to release to our Pro sponsors on github soon ...

    Query results are displayed using our public #DataTableRenderers 🈸 extension for VSCode IDE:

    πŸ“₯ marketplace.visualstudio.com/i

    #DataNotebooks πŸ“š ... #DataTools πŸ”¬ ...

  33. Took me a few hours to whip my new #DataNotebook πŸ““ extension base code into shape today. Got #PRQL loaded into vscode notebook view using Open With option. Will wire SQL gen., query & results/output data rendering with our #DataTableRenderers this week for sure! #DataTools πŸ”¬ ...

  34. Gonna create new #DuckDBSqlNotebook πŸ““ code extension next month.

    Who'd like to see it working in code notebooks by loading and executing Sql statements from plain Sql files?

    DuckDB Sql Notebook extension will have new custom DuckDB connection explorer, use vscode notebook user interface, and integrate with our #DataTableRenderers for scrollable query results display.

    #DuckDB #DataTools / #DataNotebooks πŸ“š ...

  35. Taking down fully staffed @Google #dataTools in @code team in 2 weeks of coding & 4+ months in marketplace ... @lloydtabb & your #MalloyData crew been pwned in installs in @code:

    πŸ“₯ marketplace.visualstudio.com/i

    #ProDataTools πŸ§™β€β™‚οΈ ... πŸ’ŽπŸ’ŽπŸ’Ž ... 😎

    P.S.: add #duckdb keyword to yours..

  36. Talk about #DataTools πŸͺ„ & #DuckDB in @code FTW: #MalloyData has been in dev w/fully staffed dev team since January 2022 & we put together our free trial #DuckDBSqlTools in a month in January 2023. One in VS marketplace for over a year & a half, the other only 4 months. πŸ’ŽπŸ’ŽπŸ’Ž

  37. After eval of #MalloyData tools, I am going back to #PRQL. Their repo, tools, docs & api look more mature to me, and should enable adding SQL queries to my data viewers next year.

    PRQL playground: prql-lang.org/playground/

    VSCode extension you can try: marketplace.visualstudio.com/i

    #dataTools πŸ› οΈ ...

    Explore more at: github.com/prql/prql#explore