home.social

#bigquery — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #bigquery, aggregated by home.social.

  1. One day at #Google Munich for the new services from Google.

    Lot's of new stuff. But one thing that I see throughout the industry is that so many #aiagents examples don't require AI but are pure automation.

    Also interesting to see that AI services can be called directly in #bigquery... But thinking about the #aiact and respective #governance, this means that a lot of applicants should appear in the company's ai register. 😕

    Not a hot topic, but subject to more scanning of the query log. And.. Automation. Of course with an agent 😉

    #googlecloudnext2026
    #googlecloudnext

  2. CloudSync MLBridge y su Impacto…

    CloudSync MLBridge es una herramienta diseñada para facilitar la sincronización de datos entre Google Cloud Datastore y BigQuery. Permite a las empresas integrar sus sistemas de manera eficiente, reduciendo el tiempo necesario para actualizar los datos en tiempo real.

    norvik.tech/news/analisis-clou

    #Technology #Cloudsync #GoogleCloud #Bigquery #Datastore #NorvikTech #DesarrolloSoftware #TechInnovation

  3. I'm hiring an Analytics Engineer (GCP) to join my team at RHR International.
    What you'd actually be doing: building and owning our analytics foundation in a Google Cloud GCP-first environment — BigQuery, Data/Looker Studio, Python, SQL, GitHub, Docker. Real production work, version-controlled and documented, not throwaway queries.
    RHR is a leadership consulting firm that's been around for 80+ years. We're cloud-first, SaaS-only, no on-prem. Small IT team, which means your work matters immediately.
    What I'm looking for beyond the technical skills: curiosity, self-direction, and the ability to explain what you built and why to people who don't write code. Bonus points if you've fixed something nobody asked you to fix.
    Hybrid in Chicago preferred, remote considered.
    Apply here: linkedin.com/jobs/view/4399748
    If you know someone who fits, I'd appreciate the tag or share.

    #Hiring
    #AnalyticsEngineer
    #GCP
    #BigQuery
    #DataEngineering
    #Chicago
    #RHRInternational
    #Google
    #GoogleCloud
    #GoogleCloudPlatform

  4. Data Studio is back: Google kills Looker Studio name for good: Google reversed its 2022 Looker Studio rebrand on April 11, 2026, restoring the Data Studio name and expanding the platform with BigQuery agents and Colab apps. ppc.land/data-studio-is-back-g #DataStudio #Google #BigQuery #LookerStudio #DataAnalytics

  5. I'm hiring an Analytics Engineer (GCP) to join my team at RHR International, reporting directly to me.

    What you'd actually be doing: building and owning our analytics foundation in a GCP-first environment — BigQuery, Looker Studio, Python, SQL, GitHub, Docker. Real production work, version-controlled and documented, not throwaway queries.

    RHR is a leadership consulting firm that's been around 80+ years. We're cloud-first, SaaS-only, no on-prem. Small IT team, which means your work matters immediately.

    What I'm looking for beyond the technical skills: curiosity, self-direction, and the ability to explain what you built and why to people who don't write code. Bonus points if you've fixed something nobody asked you to fix.

    Hybrid in Chicago preferred, remote considered.

    Link to apply: linkedin.com/jobs/view/4399748

    If you know someone who fits, I'd appreciate the tag or share.

    #Hiring #AnalyticsEngineer #GCP #BigQuery #DataEngineering #Chicago #RHRInternational #Google

  6. geoparquet-io: Fast #GeoParquet tool: geoparquet-io is an open-source #CLI tool and #Python library for converting, inspecting, optimizing, and partitioning #GeoParquet files, automatically applying GeoParquet performance best practices along the way. Its extract command can pull geodata from sources such as #WFS, #Esri ArcGIS Feature Services, or #BigQuery into GeoParquet.
    spatialists.ch/posts/2026/04/0 #GIS #GISchat #geospatial #SwissGIS

  7. TCO или Полная Стоимость Владение современных подходов в ETL для DB MPP

    О чем эта статья : В данной статье я хочу сравнить TCO старых добрых ETL как например Informatica, ODI, MarkitEDM и подобных им vs DBT + AirFlow и подобных им Очень легко проанализировать стоимость лицензий или вычислений и хранения в случае облачной БД, но очень сложно — TCO. Стоимость разработки одной фичи, стоимость поддержки, стоимость сопровождения, стоимость изменений. Очень заманчиво учитывать только расходы на лицензии и вычисления и предполагать, что все остальные расходы одинаковы, хотя это не так. По умолчанию облачные MPP-базы обычно дешевле по хранению и вычислениям и не имеют лицензионной платы, и возникает соблазн использовать такой же безлицензионный подход в ETL, но есть недостатки :

    habr.com/ru/articles/1014362/

    #mppбазы #informatica #dbt #etl #airflow #oracle #bigquery

  8. Claude Code + BigQuery → agent analityczny, który pracuje na Twoich danych 24/7

    Bez kopiowania zapytań. Bez pośredników. Bez przełączania między narzędziami.

    To wszystko dzięki połączeniu Claude'a bezpośrednio do BigQuery przez MCP.

    #iToSięLiczy
    #AI #BigQuery #GoogleCloud #GA4 #DataDrivenMarketing #Automatyzacja #MarketingAnalytics

  9. построение интеллектуальной системы вопросов и ответов и корпоративной базы знаний на базе StarRocks + DeepSeek

    Типовые сценарии на базе StarRocks + DeepSeek. DeepSeek: генерация качественных эмбеддингов и ответов, StarRocks: высокоэффективный векторный поиск и хранение.Вместе они образуют основу для точных и масштабируемых AI‑решений.

    habr.com/ru/articles/980410/

    #starrocks #deepseek #vector_index #rag #bigdata #bigquery

  10. построение интеллектуальной системы вопросов и ответов и корпоративной базы знаний на базе StarRocks + DeepSeek

    Типовые сценарии на базе StarRocks + DeepSeek. DeepSeek: генерация качественных эмбеддингов и ответов, StarRocks: высокоэффективный векторный поиск и хранение.Вместе они образуют основу для точных и масштабируемых AI‑решений.

    habr.com/ru/articles/980410/

    #starrocks #deepseek #vector_index #rag #bigdata #bigquery

  11. построение интеллектуальной системы вопросов и ответов и корпоративной базы знаний на базе StarRocks + DeepSeek

    Типовые сценарии на базе StarRocks + DeepSeek. DeepSeek: генерация качественных эмбеддингов и ответов, StarRocks: высокоэффективный векторный поиск и хранение.Вместе они образуют основу для точных и масштабируемых AI‑решений.

    habr.com/ru/articles/980410/

    #starrocks #deepseek #vector_index #rag #bigdata #bigquery

  12. построение интеллектуальной системы вопросов и ответов и корпоративной базы знаний на базе StarRocks + DeepSeek

    Типовые сценарии на базе StarRocks + DeepSeek. DeepSeek: генерация качественных эмбеддингов и ответов, StarRocks: высокоэффективный векторный поиск и хранение.Вместе они образуют основу для точных и масштабируемых AI‑решений.

    habr.com/ru/articles/980410/

    #starrocks #deepseek #vector_index #rag #bigdata #bigquery

  13. #ITByte: Amazon #Redshift and Google #BigQuery are two of the most popular cloud #Data warehouses: two comparable fully managed petabyte-scale cloud data warehouses.

    Here is a short comparison between the two.

    knowledgezone.co.in/posts/Clou

  14. RE: saptodon.org/@nextlytics/11550

    Our #webinar from last week is available as an on-demand recording for anyone who missed it. How can #SAP Business Data Cloud interact with a wider ecosystem of modern data platforms like #Databricks, #Snowflake, #BigQuery, and (new this week) #Fabric? Where does this trend lead?

    Spoiler: maybe truly open players have the advantage in the future interoperable data ecosystem over old-fashioned proprietary-first vendors...

    #datascience #dataengineering #datawarehouse #datalakehouse #lakehouse

  15. 🚀 TopicWatchdog – Week 3: Stable Topics with BERTopic

    KMeans worked, but cluster IDs kept jumping across retrains. This week I added a Python BERTopic stage with a BigQuery registry → stable topic IDs!

    🟢 UMAP + HDBSCAN
    🟢 Stable IDs via registry
    🟢 Auto-labels with Gemini
    🟢 Looker Studio dashboards

    📊 3,802 topics → 2,472 mapped, top clusters: migration, economy, climate, politics.

    👉 Blog: dracoblue.net/dev/topicwatchdo

    #TopicWatchdog #BERTopic #BigQuery
    #Clustering
    #MachineLearning
    #FediScience

  16. „Kickoff (Week 1): Extracting Topics & Claims from German Politics Videos“

    dracoblue.net/dev/kickoff-topi

    „Limitations: small sample size (≈175 videos), no stable Topic IDs yet, clustering not applied, and claims only minimally canonicalized.“

    #LLM #gcp #bigquery #YouTube #Politics #Research #transcripts #gemini

  17. How messy is #terraform with #gcp?

    I'm trying to make a system where #rust worker ingests data of HTTP requests via #cloudrun, and passes them into #bigtable, which uses further ingestion recipe to export the data into #bigquery

    I have tried to make a complete terraform declaration for this, but got into permission issues, then I have tried to make a system that generates all the artefacts like service accounts and docker image and then refers to those from terraform builds, but I almost don't see a value of doing it like that.

    Does anyone have an example of #cloudrun #cdc? I am new to this and I feel really slow.

    #lazyweb #askfedi

  18. New geospatial data in Google BigQuery: #Google is adding geospatial content to its #DWH solution #BigQuery. Additions encompass annotated Street View #imagery, Places (#POI) data, and #traffic data, among others.
    spatialists.ch/posts/2025/04-1 #GIS #GISchat #geospatial #SwissGIS

  19. Diving into #Vermont wildlife for the #30DayChartChallenge "circle" day! 🦌 Using #Python & #plotly to compare monthly #Moose and #BlackBear sightings 🐻 Data wrangled with #BigQuery and #SQL. Any guesses which animal is seen more consistently throughout the year? 😜 #DataViz #Wildlife #RadialChart

  20. GA4 intraday exports and cookieless pings

    I build a lot of reports for clients that use Big Query GA4 as source.

    Now.. that works like a charm. But.. you will need to wait some time to get processed data from the events_ tables.

    More recent data will appear in the streaming _intraday_ tables, if you have that enabled. But.. that data is not always complete! Especially when your site has consent mode enabled, and does not set a cookie until after consent.

    Here’s how it works:

    The scenario

    Someone visits the site for the first time (source: some campaign), gets confronted with the cookie banner, and then clicks accept.

    We tagged the site correctly, so this is what happens

    1. a page_view event triggers (with URL parameters) – and notices analytics consent is denied (the default)
    2. the tracker attaches some parameters to this hit, to help processing
      • a session is started
      • this is the first visit
    3. there is an item list on the page: view_item_list event is triggered
    4. the cookiebanner pops up (event: cookiebar_view)
    5. the visitor clicks accept (event: cookiebar_accept) and the tracker gets sent a granted signal
    6. now the cookie can be uses, and is attached to an automatic event user_engagement

    Sounds simple. Now, let’s see what is streamed into Big Query:

    The streaming data gap

    Basically, the intraday tables store what happens, as it happens.

    • cookie field ( user_pseudo_id ) is filled in on hits on/after consent
    • cookie field is NULL for hits before consent

    As it should be, right? But there’s a third bullet:

    • first batch of events will not appear in the intraday table!

    Here’s what we see (most recent hit first, read from bottom to top)

    1. the page_view is missing in the streaming table
    2. the collected_traffic_source information is missing (it is always only filled in on the first batch of events)
    3. As a byproduct, we also do not see the session start and first visit
    4. the other events are all sent without a cookie
    5. after consent, we see the user_pseudo_id – finally

    The next day.. Google has glued it all together

    Processed data: every event has a row

    The following is in the processed data: (most recent hit first, read from bottom to top)

    • The page_view event and all other events leading up to the consent have a cookie attached to it! Google rescued that information
    • the “Attached” parameters to the hit expand to two extra rows
      • session_start
      • first_visit
    • we have source information: collected_traffic_source is present – on the first batch, as normal

    Not visible in the screenshot: session_traffic_source_last_click – the session information is properly filled in.

    The consequences

    If you decide to use intraday tables in your Big Query reports: be aware that although the information is fresh (no pun intended, GA360 users), it’s incomplete

    • intraday misses crucial events, namely the first batch (most often a page_view)
      • bye bye landing_page reports based on page_views
      • bye bye traffic source reports based on session_traffic_source_last_click or collected_traffic_source
    • intraday misses cookies on some events
      • which is not too much of an issue, really

    Your experiences?

    Do you use intraday tables in your models? Have you found clever workarounds to get the correct data in?

    Let me know! Drop a comment here, or send me a bluesky message!

    Still here?

    Check out GA4Dataform – a product I’ve helped build that turns the GA4 Big Query exports into usable tables!

    Related posts:

    Google Analytics 4 truncates page locationMaking sense of Event Parameters in GA4Make your GA4 life easier: Some powertips!Smart incremental GA4 tables in Dataform

    #bigQuery #consentMode #cookies #ga4 #tagging

  21. Today is DBA Appreciation Day!

    If you have a DBA in your company who relentlessly takes care that your databases are humming along and delivering query results, today is the day to say Thank You!

    #PostgreSQL #MySQL #MariaDB #Oracle #Greenplum #SQLite #SQLServer #MongoDB #Redis #Snowflake #DB2 #Elasticsearch #Teradata #InfluxDB #Firebird #Informix #Couchbase #CouchDB #Vertica #DuckDB #CockroachDB #SAPHana #Splunk #DynamoDB #BigQuery #Hive #Neo4j ...

    dbaday.org/

  22. #LookerStudio - Blog post

    Usually, BigQuery helps so much to create high-level Looker Studio reports.

    In this article, it is the opposite, LS comes to assist BigQuery:
    How to explore quickly schemas of #BigQuery tables with Looker Studio

    bit.ly/3u56gOB

  23. Tracking the #Fake #GitHub #Star #BlackMarket with #Dagster, #dbt and #BigQuery | #DagsterBlog

    "We knew there were dubious services out there offering #StarsCorCash, so we set up a dummy repo (frasermarlow/tap-bls) and purchased a bunch of stars. From these, we devised a profile for fake accounts and ran a number of #repos through a test using the GitHub REST API (via pygithub) and the GitHub Archive database."

    dagster.io/blog/fake-stars

  24. It was about time... Finally!

    Item-scoped custom dimensions in #GA4 are being rolled in explorations...

    No signs of them in #LokerStudio , #DataAPI or #BigQuery export yet...

  25. It was about time... Finally!

    Item-scoped custom dimensions in #GA4 are being rolled in explorations...

    No signs of them in #LokerStudio , #DataAPI or #BigQuery export yet...

  26. It was about time... Finally!

    Item-scoped custom dimensions in #GA4 are being rolled in explorations...

    No signs of them in #LokerStudio , #DataAPI or #BigQuery export yet...

  27. RT @isb_cgc: Proteomic Data Commons release V2.15 case, file, and quant data are now available in #BigQuery. Check out the new tables using our BigQuery Table Search tool at isb-cgc.appspot.com/bq_meta_se and filter Source by PDC.
    #proteome #NCIProteomics #CancerResearch #proteomic twitter.com/isb_cgc/status/162