home.social

#ismbeccb2023 — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #ismbeccb2023, aggregated by home.social.

  1. We had an excellent #CompMS session at the #ISMBECCB2023 conference last week.

    Many thanks to keynote speakers @[email protected], @[email protected], and @[email protected]; all selected speakers; and poster presenters for showcasing the latest computational advances in mass spectrometry, with applications across #proteomics, #metabolomics, #lipidomics, and more.

  2. RT @: Keep calm, Pfam is still running!But now it's hosted on the InterPro website! At #ISMBECCB2023, we had the opportunity to learn more about @PfamDB and its integration with @InterProDB website. We even won these really cool t-shirts,Thanks!

  3. Mark Gerstein at #ISMBECCB2023: Deep learning is exciting, but let's not forget about the physical and biological models underlying the science we're interested in. Let's make biomedical data science more like weather forecasting.

  4. Névéol: What can we do?
    Understand the stakes better.
    Facilitate levers like data sharing, shared tasks, and policy.
    Write more documentation, for protocols, etc.; elicit audits.

    See Cohen-Boulakia et al 2017 Future Gen Comput Syst

    #ismbeccb2023
    #textmining

  5. Aurélie Névéol:
    How can we make clinical NLP more reproducible? Can NLP also help with reproducibility? Even word or sentence tokenization can be inconsistent. Most NLP folks have, at least once, failed to repeat someone else's experiment, or even their own. Sometimes it's due to differences in preprocessing, software versions, training vs test splits, or other boring things. Availability issues, page limits, and the bias toward novelty don't help either.

    #ismbeccb2023
    #textmining

  6. One perk of attending #ISMBECCB2023 virtually: watching the recording of a keynote I missed instead of the talk I had planned to watch but turned out not to be interested in.

    (I guess you could also plug in your headphones and do the same if you're there in person, but that's noticeably ruder.)

  7. Sylwia Szymanska: Word embeddings capture functions of low complexity regions: scientific literature analysis using a transformer-based language model

    Low-complexity regions in proteins are biologically important. But there isn't a database or even a list of these relationships. So let's extract them with a language model.
    #ismbeccb2023
    #textmining

  8. Brett Beaulieu-Jones: Can we use large language models with clinical notes to estimate likelihood of seizure recurrence? Yes - and even with good results - but models are difficult to interpret. So can we build a model that includes things we really care about, then add an instructable layer? Yes! Use note metadata as weak supervision -> instructions for the model. A tuned T5-Flan model does really well.

    #ismbeccb2023
    #textmining

  9. Robert Leaman: BioNER requires multiple entity types -> relation extraction. But human-annotated NER data are scarce - < 0.01% of PubMed articles. Can use pre trained language model to do multitask NER...but instead we could modify the data. Include annotations for negative mentions by type + tokens for sentence start/end.
    AIONER converts training data to this form and aggregates data sets. Moderate improvement on most BioNER types.

    Repo here: github.com/ncbi/AIONER

    #ismbeccb2023
    #textmining

  10. Katerina Nastou: Benchmarking species name NER is some thing the S800 corpus was used for, but otherwise well-performing models were doing poorly on it. Problem? Annotation inconsistencies in S800. It's been manually revised using stricter rules, and just species, strain, and genera names (each with their own tags). 200 more documents too, so now it's S1000.

    How do NER models do on it? F1 up around 89 to 91.

    Get corpus at zenodo.org/record/7064902

    #ismbeccb2023
    #textmining

  11. Esmaeil Nourani: Health involves lifestyle factors. Can we extract relations connecting those to disease?
    Developing a draft lifestyle ontology. Started with 869 concepts across multiple branches. Needed to get synonyms, too - embeddings helped with that, and also allowed discovery of new candidate terms. Full draft is now 1652 concepts. Ready for NER and RE.

    #ismbeccb2023
    #textmining

  12. Krallinger: Organizing shared tasks. Some processes can take years. Examples - CANTEMIST, CodiEsp, MESINESP, MEDDOCAN, MEDDOPROF, ClinSpEn, DisTEMIST. Most recently MEDDOPLACE, PharmaCoNER
    #ismbeccb2023
    #textmining

  13. Krallinger: It's important to engage clinical experts from the beginning. That includes their considerations on the content sources.

    Annotation guidelines are necessary. See the guides at zenodo.org/communities/medical
    Translating these to languages beyond English helps the community.

    #ismbeccb2023
    #textmining

  14. Krallinger: Developing language models for clinical data in Spanish. Since clinical text varies so much in structure and content, you need a balance between general language and domain-specific optimization. Need some clear annotation guidelines too.

    Really need a set of clear clinical use cases, too.
    #ismbeccb2023 #textmining

  15. Hi #ismbeccb2023.
    I'm in Text Mining today.

    Martin Krallinger: Unstructured text from clinical narratives is still underused. There are many other text sources too, like patient forums or drug leaflets, but clinical narratives are especially difficult. No out of the box NLP solution works. Need data, infrastructure, and reproducible benchmarks.

    #textmining

  16. As I’m heading back to London from #BOSC2023, I can only reflect on much this conference means to me. If @BOSC didn’t exist my professional (and personal!) life would look quite differently and I’d be a lot poorer for it.

    It was a blast to meet so many old friends again and make new ones, amongst a community of folks being so dedicated to and passionate about open science/data/source and doing ethical research more broadly.

    #ISMBECCB2023

  17. * Zachary Flamholz: Unannotated *viral* proteins. There are many of them, and annotation is usually done by homology. See the PHROGs database of phage genomes - representations of these sequences can accurately identify functional category. Also enables identifying some novel protein families.

    See researchsquare.com/article/rs-

    #ismbeccb2023

  18. * Miguel Fernández Martín: Comparing bacterial protein interactomes to find antibiotic resistance genes. (Back In My Day, we did this with a lot of Y2H). An adaptation of ContextMirror that takes coevolutionary context into account should work. Spoiler: it does. Likely a good way to assemble experimental interactomes with better guidance.
    #ismbeccb2023

  19. Back to Function!
    * Aysun Urhan: What to do with proteins of unknown function? A new species -> new genes. We can make protein sequence embeddings to try to infer homology, though most embedding approaches so far haven't focused on bacteria. Use what we know about operons (including predicting if they haven't been confirmed) and combine with protein embeddings. Then assign GO terms w/ cosine similarity. This does work better than using AA's alone.

    See github.com/AbeelLab/sap

    #ismbeccb2023

  20. * Blessy Antony: What causes a virus to switch hosts? Can we computationally predict the potential host species and necessary mutations? Focusing on mammal + bird viral hosts, a transformer encoder model appears to do best in predicting host switch. It's easier with species we have more examples for. For mutation prediction, sequence embeddings suggest some potential zoonosis-related sequences.
    #ismbeccb2023

  21. * Xiaoyue Cui: Multidomain proteins - they contain families of domains with variable architecture. There are constraints on that architecture, though - it isn't completely random. It can be simulated (see DomArchov, cs.cmu.edu/~durand/DomArchov/). So why not use some natural language processing methods? Embeddings, yes, with Word2Vec. Can get the TF-IDF too. Works well for comparative analysis.
    #ismbeccb2023

  22. Back in EvolCompGen.
    * Milana Frenkel-Morgenstern: A new database called PASTORAL (Protein-protein interActions of Stress-reaponse genes in subTerranean and fossORial AnimaLs). Animals in this DB are diverse but share ecological contexts. Clustering protein interactions shows they break down based on shared ecology of their source species.
    #ismbeccb2023

  23. In the Education track.
    Stephen Piccolo: "End user" researchers may just need small scripts and tools. Basic bioinformatics courses teach that. How does ChatGPT handle an entry-level programming prompt? Can it filter data? Make plots? Out of 184 prompts, it passed 139 on the first try. It had some difficulty identifying CpG islands, writing regexes, and interpreting longer prompts.

    Slides: docs.google.com/presentation/d

    #ismbeccb2023

  24. Salvatore Cosentino: SonicParanoid is a tool for orthology prediction over proteomes. This can otherwise be a slow process, but SonicParanoid2 speeds it up with an AdaBoost model and by ignoring pairs unlikely to be orthologous. To find domains, there's a Doc2Vec model. Working at the domain level makes it easier to find fusion proteins. Someone ran 1000 animal proteomes on this tool and it finished...eventually?

    See Cosentino and Iwasaki 2023 bioRxiv doi.org/10.1101/2023.05.14.540

    #ismbeccb2023

  25. At #ismbeccb2023, #COSI Function track. Hey @iddux, you are missed here, mate

  26. 1/8 - #BOSC2023 #ISMBECCB2023 Summary

    The 24th annual Bioinformatics Open Source Conference, BOSC 2023 (open-bio.org/events/bosc-2023), kicked off with a welcome from chair Nomi Harris, an overview of BOSC’s parent organization, the Open Bioinformatics Foundation, and a report about the pre-BOSC CollaborationFest (#CoFest), a collaborative work event (including but not limited to hacking) hosted by the nearby ENS de Lyon.

  27. Here at #ismbeccb2023 at the #COSI 3DSig, enjoying this morning session!

  28. RT @: Today at 4PM Alan Bridge of #SwissProt presents the efforts at @UniProt-KB to improve knowledge representation with this resource, using language models in #AI for expert curation, and the role of expert bicurators in the session “Knowledge & Impact from Data”. #ISMBECCB2023

  29. We're super-excited for our 24th annual BOSC, #BOSC2023, starting today (July 24) as part of #ISMBECCB2023!

    Don't miss our opening keynote by Sara El-Gebali (@yalahowy), after a brief "Welcome to BOSC" session including
    @OpenBio and #CoFest overviews.