home.social

#metadata — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #metadata, aggregated by home.social.

  1. Код как документация: как мы строим самодокументируемые витрины данных в Почте Mail

    В аналитике больших данных есть старая проблема: код ETL-витрин живет своей жизнью, а документация — своей. Изменяешь логику, забываешь обновить описание колонки — и через месяц никто не помнит, что означает wallet_cards_category_hits. В Почте Mail (VK) мы решили эту проблему системно, разработав внутренний фреймворк, который делает код витрины и ее документацию неразрывными. На связи Дима Швеенков. Я все так же руковожу направлением аналитики в команде и отвечаю за данные в Почте Mail , а теперь еще и отвечаю за DWH в VK Tech . В предыдущих статьях я подробно рассказывал о нашем Data Driven-подходе к работе с данными, а также, в частности, как мы работаем со Spark и какие ключевые проблемы с данными мы решили, чтобы построить свое хранилище данных. Сегодня хотел бы остановиться на более узкой теме — как держать в порядке документацию, если у вас такое же огромное хранилище, как и у нас. Материал короткий, но, надеюсь, будет для вас полезным.

    habr.com/ru/companies/vktech/a

    #big_data #apache_spark #airflow #clickhouse #sql #документация #dwh #metadata #dbt #vk_tech

  2. What does it mean to be a DataCite member? Why should you consider joining? We answered these questions & more in our new Membership Essentials webinar series.
    ICYMI, watch the recording to learn about DataCite membership pathways, services, governance, fees & how to get started
    youtu.be/206BZd-5fA4
    #DataCite
    #PersistentIdentifiers
    #PIDs
    #DOI
    #OpenInfrastructure
    #OpenScience
    #ResearchData
    #ScholarlyCommunication
    #ResearchInfrastructure
    #Metadata
    #CommunityGovernance
    #Membership
    #PIDCommunity

  3. What does it mean to be a DataCite member? Why should you consider joining? We answered these questions & more in our new Membership Essentials webinar series.
    ICYMI, watch the recording to learn about DataCite membership pathways, services, governance, fees & how to get started
    youtu.be/206BZd-5fA4
    #DataCite
    #PersistentIdentifiers
    #PIDs
    #DOI
    #OpenInfrastructure
    #OpenScience
    #ResearchData
    #ScholarlyCommunication
    #ResearchInfrastructure
    #Metadata
    #CommunityGovernance
    #Membership
    #PIDCommunity

  4. What does it mean to be a DataCite member? Why should you consider joining? We answered these questions & more in our new Membership Essentials webinar series.
    ICYMI, watch the recording to learn about DataCite membership pathways, services, governance, fees & how to get started
    youtu.be/206BZd-5fA4
    #DataCite
    #PersistentIdentifiers
    #PIDs
    #DOI
    #OpenInfrastructure
    #OpenScience
    #ResearchData
    #ScholarlyCommunication
    #ResearchInfrastructure
    #Metadata
    #CommunityGovernance
    #Membership
    #PIDCommunity

  5. What does it mean to be a DataCite member? Why should you consider joining? We answered these questions & more in our new Membership Essentials webinar series.
    ICYMI, watch the recording to learn about DataCite membership pathways, services, governance, fees & how to get started
    youtu.be/206BZd-5fA4
    #DataCite
    #PersistentIdentifiers
    #PIDs
    #DOI
    #OpenInfrastructure
    #OpenScience
    #ResearchData
    #ScholarlyCommunication
    #ResearchInfrastructure
    #Metadata
    #CommunityGovernance
    #Membership
    #PIDCommunity

  6. What does it mean to be a DataCite member? Why should you consider joining? We answered these questions & more in our new Membership Essentials webinar series.
    ICYMI, watch the recording to learn about DataCite membership pathways, services, governance, fees & how to get started
    youtu.be/206BZd-5fA4
    #DataCite
    #PersistentIdentifiers
    #PIDs
    #DOI
    #OpenInfrastructure
    #OpenScience
    #ResearchData
    #ScholarlyCommunication
    #ResearchInfrastructure
    #Metadata
    #CommunityGovernance
    #Membership
    #PIDCommunity

  7. apache iceberg и его философия

    iceberg и его философия metadata расскажем почему iceberg эффективно выполняет запросы и прост в управлении данными благодаря своей metadata

    habr.com/ru/articles/1033546/

    #iceberg #metadata #data_lake #s3 #hdfs #data_lakehouse #acid #olap

  8. apache iceberg и его философия

    iceberg и его философия metadata расскажем почему iceberg эффективно выполняет запросы и прост в управлении данными благодаря своей metadata

    habr.com/ru/articles/1033546/

    #iceberg #metadata #data_lake #s3 #hdfs #data_lakehouse #acid #olap

  9. apache iceberg и его философия

    iceberg и его философия metadata расскажем почему iceberg эффективно выполняет запросы и прост в управлении данными благодаря своей metadata

    habr.com/ru/articles/1033546/

    #iceberg #metadata #data_lake #s3 #hdfs #data_lakehouse #acid #olap

  10. apache iceberg и его философия

    iceberg и его философия metadata расскажем почему iceberg эффективно выполняет запросы и прост в управлении данными благодаря своей metadata

    habr.com/ru/articles/1033546/

    #iceberg #metadata #data_lake #s3 #hdfs #data_lakehouse #acid #olap

  11. Spotted in my RSS feeds: Archivarix Tube Search. From the About page: “Web pages, including YouTube video pages, are routinely captured and preserved by public web archiving initiatives such as the Internet Archive (Wayback Machine) and Common Crawl. When a YouTube video becomes unavailable for any reason, the metadata that was previously captured by these archives — including titles, […]

    https://rbfirehose.com/2026/05/10/archivarix-tube-search/
  12. Spotted in my RSS feeds: Archivarix Tube Search. From the About page: “Web pages, including YouTube video pages, are routinely captured and preserved by public web archiving initiatives such as the Internet Archive (Wayback Machine) and Common Crawl. When a YouTube video becomes unavailable for any reason, the metadata that was previously captured by these archives — including titles, […]

    https://rbfirehose.com/2026/05/10/archivarix-tube-search/
  13. Spotted in my RSS feeds: Archivarix Tube Search. From the About page: “Web pages, including YouTube video pages, are routinely captured and preserved by public web archiving initiatives such as the Internet Archive (Wayback Machine) and Common Crawl. When a YouTube video becomes unavailable for any reason, the metadata that was previously captured by these archives — including titles, […]

    https://rbfirehose.com/2026/05/10/archivarix-tube-search/
  14. Spotted in my RSS feeds: Archivarix Tube Search. From the About page: “Web pages, including YouTube video pages, are routinely captured and preserved by public web archiving initiatives such as the Internet Archive (Wayback Machine) and Common Crawl. When a YouTube video becomes unavailable for any reason, the metadata that was previously captured by these archives — including titles, […]

    https://rbfirehose.com/2026/05/10/archivarix-tube-search/
  15. Spotted in my RSS feeds: Archivarix Tube Search. From the About page: “Web pages, including YouTube video pages, are routinely captured and preserved by public web archiving initiatives such as the Internet Archive (Wayback Machine) and Common Crawl. When a YouTube video becomes unavailable for any reason, the metadata that was previously captured by these archives — including titles, […]

    https://rbfirehose.com/2026/05/10/archivarix-tube-search/
  16. Liebe 4Culture Communities,
    unsere Kolleg:innen waren mal wieder on tour 🛍️

    ✨ Dieses Mal war das Team der Cultural Research Data Academy beim Treffen der DHd-AG Referenzcurriculum in Berlin dabei!

    👩‍🏫 Hierbei stellten sie unseren NFDI4Culture-Dienst, den „Educational Resource Finder“ vor, und sprachen ebenso über Metadatenschemata.

    👏 Vielen Dank an die OrganisatorInnen und für die spannenden Beiträge und den Austausch mit allen Beteiligten!

    #DH #metadata #OER #NFDIrocks

  17. 🌊 Day 2 of #HMCConference2026 is coming to a close on the Neckar River

    After a full day of sessions and discussions, the social event offers a chance to continue conversations in a more informal setting.

    It’s great to see how the exchange continues beyond the sessions and brings people together across disciplines.

    #HMC2026 #FAI #Metadata
    @rspace

  18. 🌊 Day 2 of #HMCConference2026 is coming to a close on the Neckar River

    After a full day of sessions and discussions, the social event offers a chance to continue conversations in a more informal setting.

    It’s great to see how the exchange continues beyond the sessions and brings people together across disciplines.

    #HMC2026 #FAI #Metadata
    @rspace

  19. 🌊 Day 2 of #HMCConference2026 is coming to a close on the Neckar River

    After a full day of sessions and discussions, the social event offers a chance to continue conversations in a more informal setting.

    It’s great to see how the exchange continues beyond the sessions and brings people together across disciplines.

    #HMC2026 #FAI #Metadata
    @rspace

  20. 🌊 Day 2 of #HMCConference2026 is coming to a close on the Neckar River

    After a full day of sessions and discussions, the social event offers a chance to continue conversations in a more informal setting.

    It’s great to see how the exchange continues beyond the sessions and brings people together across disciplines.

    #HMC2026 #FAI #Metadata
    @rspace

  21. 🌊 Day 2 of #HMCConference2026 is coming to a close on the Neckar River

    After a full day of sessions and discussions, the social event offers a chance to continue conversations in a more informal setting.

    It’s great to see how the exchange continues beyond the sessions and brings people together across disciplines.

    #HMC2026 #FAI #Metadata
    @rspace

  22. All mainstream messaging platforms share the same model: a single provider owns the servers, logic, and . Between political pressure and regulatory proposals like , such centralization is a liability.

    Learn from @morrolinux how Matrix, a secure, open, and decentralized network, flips this model: lpi.org/r2yu

    @matrix

  23. 🔎 **Archaeoanalysis Hackathon: Turn metadata into data!**
    Hackathon in cooperation with @dai_weltweit @WiNoDa @nfdi4objects
    🗓 30 Jun – 3 Jul 2026, IT‑Center, RWTH Aachen
    💾 1.5 M records, one dataset – bring your own laptop
    🥗 Catering provided, **free participation**!
    📊 Judging: data‑quality, reproducibility, novelty, reuse, communication
    🔗 more info & registration: dkz2r.de/events/2026-06-30_arc

    #RWTH #Hackathon #Archaeoanalysis #DataScience #Metadata #FreeEvent

  24. New progress note available (after 3 years during which I posted here on ActivityPub instead of on the blog):

    goffi.org/id/libervia-progress

    I'm talking about current work on installation/configuration simplification and the new forge, focus on web frontend and redesign and work done with current metadata reduction and serverless XMPP (Tor, contacts e2ee, new pubsub implementation).

    #Libervia #progress #XMPP #NLnet #NCI0 #Tor #pubsub #decentralized #forge #metadata #serverless

  25. Hey @gianni I was just chatting with @yawnbox about a wild range of tech things, and, since I am starting to host an Immich instance in my home, I am considering to convert all my emails to JPEG XL, and I was wondering if there is a way to do so without risking to lose any metadata.

    I am happy to read any links you might have but I need them to be as simple as possible because encoding/decoding stuff confuses me a lot. Not my thing… This is why I found your article explaining how AV1 works in simple words absolutely illuminating! Anyway, I just want to save storage space.

    Also, I am a big big fan of Aviator, and I have been following your work for a while. Thanks so much for all of it!

    #JXL #JPEGXL #JPEG #encoding #decoding #image #metadata #compression

  26. Hey @gianni I was just chatting with @yawnbox about a wild range of tech things, and, since I am starting to host an Immich instance in my home, I am considering to convert all my emails to JPEG XL, and I was wondering if there is a way to do so without risking to lose any metadata.

    I am happy to read any links you might have but I need them to be as simple as possible because encoding/decoding stuff confuses me a lot. Not my thing… This is why I found your article explaining how AV1 works in simple words absolutely illuminating! Anyway, I just want to save storage space.

    Also, I am a big big fan of Aviator, and I have been following your work for a while. Thanks so much for all of it!

    #JXL #JPEGXL #JPEG #encoding #decoding #image #metadata #compression

  27. Hey @gianni I was just chatting with @yawnbox about a wild range of tech things, and, since I am starting to host an Immich instance in my home, I am considering to convert all my emails to JPEG XL, and I was wondering if there is a way to do so without risking to lose any metadata.

    I am happy to read any links you might have but I need them to be as simple as possible because encoding/decoding stuff confuses me a lot. Not my thing… This is why I found your article explaining how AV1 works in simple words absolutely illuminating! Anyway, I just want to save storage space.

    Also, I am a big big fan of Aviator, and I have been following your work for a while. Thanks so much for all of it!

    #JXL #JPEGXL #JPEG #encoding #decoding #image #metadata #compression

  28. Hey @gianni I was just chatting with @yawnbox about a wild range of tech things, and, since I am starting to host an Immich instance in my home, I am considering to convert all my emails to JPEG XL, and I was wondering if there is a way to do so without risking to lose any metadata.

    I am happy to read any links you might have but I need them to be as simple as possible because encoding/decoding stuff confuses me a lot. Not my thing… This is why I found your article explaining how AV1 works in simple words absolutely illuminating! Anyway, I just want to save storage space.

    Also, I am a big big fan of Aviator, and I have been following your work for a while. Thanks so much for all of it!

    #JXL #JPEGXL #JPEG #encoding #decoding #image #metadata #compression

  29. Hey @gianni I was just chatting with @yawnbox about a wild range of tech things, and, since I am starting to host an Immich instance in my home, I am considering to convert all my emails to JPEG XL, and I was wondering if there is a way to do so without risking to lose any metadata.

    I am happy to read any links you might have but I need them to be as simple as possible because encoding/decoding stuff confuses me a lot. Not my thing… This is why I found your article explaining how AV1 works in simple words absolutely illuminating! Anyway, I just want to save storage space.

    Also, I am a big big fan of Aviator, and I have been following your work for a while. Thanks so much for all of it!

    #JXL #JPEGXL #JPEG #encoding #decoding #image #metadata #compression

  30. Now available: the recording and slides from our community webinar on configuring Dataverse to register DataCite DOIs. Explore DOI registration in Dataverse, tips for improving discoverability with metadata, and other practical guidance from the Dataverse Project & DataCite: youtube.com/watch?v=YuNgI7crG5M

    #Dataverse #DataCite #DOI #PersistentIdentifiers #PID #Metadata #ResearchData #OpenScience #ScholComm #DataManagement

  31. Now available: the recording and slides from our community webinar on configuring Dataverse to register DataCite DOIs. Explore DOI registration in Dataverse, tips for improving discoverability with metadata, and other practical guidance from the Dataverse Project & DataCite: youtube.com/watch?v=YuNgI7crG5M

    #Dataverse #DataCite #DOI #PersistentIdentifiers #PID #Metadata #ResearchData #OpenScience #ScholComm #DataManagement

  32. Laatste kans! Reageer t/m 21 april op de internetconsultaties voor 3 #OpenStandaarden

    🔐 OAuth 2.0 (v1.1) -> Veiliger autoriseren.
    ⚙️ OpenAPI Spec (v3.1) -> Uniforme REST API's.
    📊 DCAT-AP-NL (v3.0) -> Uniforme metadata voor datasets.

    Jouw expertise is nodig om de digitale overheid te verbeteren.

    👉 Reageer hier: forumstandaardisatie.nl/nieuws

    #DigitaleOverheid #interoperabiliteit #API #metadata #overheid

    @DigitaleOverheid

  33. Do you know about the Collaborative Metadata (COMET) project?

    "We are working towards a future where #PID #metadata is not the sole burden of PID-creators.

    Our approach is informed and articulated by the community to ensure the maximal impact of scholarly metadata and #Research information." -- cometadata.org/

    As a COMET partner, PKP is pleased to share the invitation to the #COMET community meeting, April 22, 8 AM PDT / 3 PM UTC:

    datacite.zoom.us/meeting/regis

    @juancommander @datacite

  34. The whole point of persistent identifiers is that they *persist,* which is one reason ROR has always been committed to the Principles of Open Scholarly Infrastructure (POSI). Read our latest POSI self-assessment to learn how ROR works to ensure transparent and community-led governance, sustainability, and insurance. #PIDs #POSI #Metadata doi.org/10.71938/2xy9-em92

  35. Want to improve book info on Wikimedia? Join #EveryBookItsReader 2026

    Every Book Its Reader is a campaign to incentivise everyone to improve quality content about books through Wikipedia, Wikidata, Wikimedia Commons, Wikibooks, and Wikisource. It usually runs through the whole month of April. You can go to the campaign website and follow the instructions to link your Wikimedia account to the campaign and thus have your contributions counted.

    This means you can create a new Wikipedia page for an author or book that doesn’t exist yet or, if you want to start with a less demanding task, you can search for Wikipedia articles about your favourite authors or books, read them and add information or add references for the information already published. You can also contribute to the other platforms of Wikimedia, like the Commons, the Wikibooks, or Wikisource (if you’re uploading an item, be sure to check if you have the copyright of the work or if it’s in public domain).

    Another more easy option is to contribute to Wikidata (at least for me), a wiki of structured data. This means that once the data is there, you can ask (create queries) about what you want to know. Some examples:

    You can also use the more easy visual query builder here. But to ask questions, we need the data there.

    This year I thought I would add information about Elizabeth Fair books to Wikidata. There’s already an item for the author, but not her books. I started by creating an item for the work Bramton Wick, published in 1952. But I also wanted to add the 2017 edition by Dean Street Press, so I added a new item for that edition (one work can have several editions). And I wanted to describe it as much as possible: that it was published by DSP (there was not info about it, so I created a new item for the publisher), in the Furrowed Middlebrow collection (for which I also created an item) with an introduction by Elizabeth Crawford (that was already on Wikidata, so I linked to it directly). At the end, I went to the item about Elizabeth Fair, that was already on Wikidata, and was able to link Bramton Wick to her notable works. I’m linking here all the items to Wikidata, so if you have more info, you can go there and add to them.

    So I’m hoping to find some time during this month to add at least Fair’s other books (yeah, I know you can tell I love her books 😍).

    In the 2024 campaign, I added to Wikidata information about (autolink, in Portuguese) titles to Agatha Christie’s books to solve a problem I (and probably many others that read in more than one language) face: the fact the same book can have very different titles, which means that you can find what it seems a new to you book by a given author, but it just has a different title of a book you already own or read.

    Steven from @christie_in_translation at Instagram shares regularly different countries’ editions of Agatha Christie’s books and reflects about the different translations of her titles. In Christie’s case, we even have the same book in the same language (English) with a different title, depending if it was published in the UK or the US.

    So this year I decided to extend it to new authors and I’m using a Portuguese collection of crime fiction (Colecção Vampiro) to add the Portuguese titles to the original items’ titles in Wikidata.

    As you can see, you can go from simple to more complex contributions to the Every Book Its Reader, and each one is as much important as the other. So, why not give it a go?

    #AgathaChristie #books #ColecçãoVampiro #CrimeFiction #DeanStreetPress #ElizabethFair #EveryBookItsReader #fiction #FurrowedMiddlebrow #Metadata #reading #Technology #Wikidata #Wikimedia
  36. 🚀 Operando4NeXus Kick-off in Berlin!

    The #HMCproject #Operando4NeXus has started. Partners met in #Berlin to align on goals and kick off collaboration.

    The project advances #interoperability and #metadata integration in the #NeXus standard, supporting #FAIRdata in photon & neutron science.

    Great exchange with @fairmat_nfdi @DAPHNE4NFDI & #SECoP@HMC — metadata is teamwork!

    👉 helmholtz-metadaten.de/inf-pro

    #OpenScience #ResearchData

    @helmholtz @hzbde @HZDR @fzj @KIT_Karlsruhe @DESY #PSI #ESS