home.social

#informationextraction — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #informationextraction, aggregated by home.social.

  1. Sehr schöne Präsentation zu automatischer Metadatenextraktion aus einem Korrespondenzkorpus von Sabrina Strutz (Graz).

    Sorgfältige Arbeit/Evaluation mit kritischer Durchsicht der GT Daten (welche Informationen stecken aus der Printedition da drin, die aber aus dem Brief gar nicht entnommen werden können?), Aufschlüsselung von Ergebnisqualität nach Task (Autor-/Ortserkennung) und Phase (Erzeugung von Kandidaten und Bestimmung des endgültigen Vorschlags).

    Qwen3-14B-Q6 als lokales Modell zwar schlechter als Sonnet 4.6 (welches sehr gute Ergebnisse liefert, aber auch am teuersten ist) und GPT 5.2, aber auch keine ganz schlechten Ergebnisse. (Und besser mit abgeschaltetem Reasoning!)

    Alle Modelle haben Probleme, Schreibeorte aus dem Text zu erschließen, wenn sie nicht in der Datumszeile genannte werden.

    #DHd2026 #InformationExtraction #LLM

  2. 🤖🔧 Apparently, structured outputs are the latest "sliced bread" of #AI, but turns out they're just fancy-shmancy wrappers that make your LLM dumber than a bag of hammers 🤦‍♂️. Who knew that squeezing responses into neat little boxes could actually lead to a train-wreck of information extraction? 🚂💥
    boundaryml.com/blog/structured #Innovation #AI #Limitations #LLMs #InformationExtraction #TechHumor #StructuredOutputs #HackerNews #ngated

  3. Today, I'm at Bundesarchiv in Koblenz for the Strategy & Planning meeting of our project "Wiedergutmachung". Our task in this project is to develop efficient information extraction from historical case files of the German Postwar recompensation process of nationalsocialist injustice.

    @fiz_karlsruhe @fizise @LandesarchivBW #bundesarchiv @bmf #knowledgegraphs #llms #AI #informationextraction #archives #project @ddbkultur @archivportal @MahsaVafaie @fschwic

  4. Today, I'm at Bundesarchiv in Koblenz for the Strategy & Planning meeting of our project "Wiedergutmachung". Our task in this project is to develop efficient information extraction from historical case files of the German Postwar recompensation process of nationalsocialist injustice.

    @fiz_karlsruhe @fizise @LandesarchivBW #bundesarchiv @bmf #knowledgegraphs #llms #AI #informationextraction #archives #project @ddbkultur @archivportal @MahsaVafaie @fschwic

  5. Today, I'm at Bundesarchiv in Koblenz for the Strategy & Planning meeting of our project "Wiedergutmachung". Our task in this project is to develop efficient information extraction from historical case files of the German Postwar recompensation process of nationalsocialist injustice.

    @fiz_karlsruhe @fizise @LandesarchivBW #bundesarchiv @bmf #knowledgegraphs #llms #AI #informationextraction #archives #project @ddbkultur @archivportal @MahsaVafaie @fschwic

  6. Today, I'm at Bundesarchiv in Koblenz for the Strategy & Planning meeting of our project "Wiedergutmachung". Our task in this project is to develop efficient information extraction from historical case files of the German Postwar recompensation process of nationalsocialist injustice.

    @fiz_karlsruhe @fizise @LandesarchivBW #bundesarchiv @bmf #knowledgegraphs #llms #AI #informationextraction #archives #project @ddbkultur @archivportal @MahsaVafaie @fschwic

  7. Today, I'm at Bundesarchiv in Koblenz for the Strategy & Planning meeting of our project "Wiedergutmachung". Our task in this project is to develop efficient information extraction from historical case files of the German Postwar recompensation process of nationalsocialist injustice.

    @fiz_karlsruhe @fizise @LandesarchivBW #bundesarchiv @bmf #knowledgegraphs #llms #AI #informationextraction #archives #project @ddbkultur @archivportal @MahsaVafaie @fschwic

  8. Open PhD/Junior Researcher Position in Neurosymbolic AI and Information Extraction on historical documents at FIZ Karlsruhe - Knowledge-driven AI research group (former ISE research group), starting at Jan 1, 2026.
    Application Deadline: Oct 31, 2025
    fiz-karlsruhe.de/en/stellenanz

    #jobadvertisement #phd #AI #neurosymbolicAI #informationextraction #machinelearning #knmowledgegraphs #ontologies @fiz_karlsruhe @fizise #dh #culturalheritage @nfdi4culture @MahsaVafaie @tabea @sourisnumerique @enorouzi

  9. Open PhD/Junior Researcher Position in Neurosymbolic AI and Information Extraction on historical documents at FIZ Karlsruhe - Knowledge-driven AI research group (former ISE research group), starting at Jan 1, 2026.
    Application Deadline: Oct 31, 2025
    fiz-karlsruhe.de/en/stellenanz

    #jobadvertisement #phd #AI #neurosymbolicAI #informationextraction #machinelearning #knmowledgegraphs #ontologies @fiz_karlsruhe @fizise #dh #culturalheritage @nfdi4culture @MahsaVafaie @tabea @sourisnumerique @enorouzi

  10. Open PhD/Junior Researcher Position in Neurosymbolic AI and Information Extraction on historical documents at FIZ Karlsruhe - Knowledge-driven AI research group (former ISE research group), starting at Jan 1, 2026.
    Application Deadline: Oct 31, 2025
    fiz-karlsruhe.de/en/stellenanz

    #jobadvertisement #phd #AI #neurosymbolicAI #informationextraction #machinelearning #knmowledgegraphs #ontologies @fiz_karlsruhe @fizise #dh #culturalheritage @nfdi4culture @MahsaVafaie @tabea @sourisnumerique @enorouzi

  11. Open PhD/Junior Researcher Position in Neurosymbolic AI and Information Extraction on historical documents at FIZ Karlsruhe - Knowledge-driven AI research group (former ISE research group), starting at Jan 1, 2026.
    Application Deadline: Oct 31, 2025
    fiz-karlsruhe.de/en/stellenanz

    #jobadvertisement #phd #AI #neurosymbolicAI #informationextraction #machinelearning #knmowledgegraphs #ontologies @fiz_karlsruhe @fizise #dh #culturalheritage @nfdi4culture @MahsaVafaie @tabea @sourisnumerique @enorouzi

  12. Our colleague Hidir Arras from patent4science research is co-organizing the 6th PatentSemTech Workshop at #SIGIR2025 in the beautiful city of Padua, Italy! Call for Papers is open 'til April 23: ifs.tuwien.ac.at/patentsemtech

    Submit your cutting-edge research, case studies, and demos exploring #AI, #NLP, and #TextMining innovations applied to #IP and related domains.

    @fiz_karlsruhe #informationextraction #datamining #ir

  13. We currently have two fully-funded open PhD positions in our group with a focus on #NLProc, #InformationExtraction and #TextGeneration. I can really recommend both the group as well as Philipp Cimiano as a supervisor, so take this opportunity!

    NLP/Text Generation
    EN: uni-bielefeld.hr4you.org/job/v
    DE: uni-bielefeld.hr4you.org/job/v

    NLP/Information Extraction
    EN: uni-bielefeld.hr4you.org/job/v
    DE: uni-bielefeld.hr4you.org/job/v

    If you have any questions, do not hesitate to contact me or Philipp directly!

  14. ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

    task description: nfdi4ds.github.io/nslp2025/doc
    website: codabench.org/competitions/539

    @eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

  15. ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

    task description: nfdi4ds.github.io/nslp2025/doc
    website: codabench.org/competitions/539

    @eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

  16. ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

    task description: nfdi4ds.github.io/nslp2025/doc
    website: codabench.org/competitions/539

    @eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

  17. ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

    task description: nfdi4ds.github.io/nslp2025/doc
    website: codabench.org/competitions/539

    @eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

  18. ReadMe2KG: Github ReadMe to Knowledge Graph #Challenge has been published as part of the Natural Scientific Language Processing and Research Knowledge Graphs #NSLP2025 workshop co-located with #eswc2025. This #NER task aims to complement the NDFI4DataScience KG via information extraction from GitHub README files.

    task description: nfdi4ds.github.io/nslp2025/doc
    website: codabench.org/competitions/539

    @eswc_conf @GenAsefa @shufan @NFDI4DS #NFDIrocks #knowledgegraphs #semanticweb #nlp #informationextraction

  19. Today I' attending the @Textplus #TextplusPlenary2024 at Schloss Mannheim @unimannheim
    I will present a poster about our work on #InformationExtraction from tables in old German magazines.

  20. GOT (General OCR Theory) is a 580M end-to-end OCR-2.0 model now on Hugging Face 🤗

    "GOT consists of a Vision-Encoder to convert images into transformers images into tokens and a decoder for generating OCR outputs in various formats (e.g., plain text, markdown, Mathpix). GOT is designed to handle complex tasks like sheets, formulas, and geometric shapes."

    Model: huggingface.co/ucaslcl/GOT-OCR
    GitHub: github.com/Ucas-HaoranWei/GOT-
    paper: arxiv.org/abs/2409.01704

    #ocr #transformers #informationextraction #ai

  21. Yesterday after the successful PhD defense of Nicolas Heist at University of Mannheim together with Chris Bizer and Heiko Paulheim. Congratulations Dr. Nico!

    Thesis title: Exploiting semi-structured information in Wikipedia for Knowledge Graph Construction
    uni-mannheim.de/dws/news/nicol

    #knowledgegraphs #wikipedia #dbpedia #informationextraction #llms @fiz_karlsruhe @fizise @KIT_Karlsruhe

  22. Nice overview about LLMs for data annotation including paper references of papers with open source code & data.
    Zhen Tan et al, Large Language Models for Data Annotation: A Survey, arxiv.org/abs/2402.13446

    #llms #generativeai #informationextraction #dataannotation