#ismbeccb2023 — Public Fediverse posts on home.social

GigaScience Journal @[email protected] · 2023-08-07 · 09:06 UTC

Write-up of #ISMBECCB2023 and our 11th birthday celebrations in Lyon.

Going Large (Language Models) at ISMB2023/@[email protected] http://gigasciencejournal.com/blog/going-large-language-models-at-ismb2023/

#ismbeccb2023

Wout Bittremieux @[email protected] · 2023-08-01 · 07:46 UTC

We had an excellent #CompMS session at the #ISMBECCB2023 conference last week.

Many thanks to keynote speakers @[email protected], @[email protected], and @[email protected]; all selected speakers; and poster presenters for showcasing the latest computational advances in mass spectrometry, with applications across #proteomics, #metabolomics, #lipidomics, and more.

#compms #ismbeccb2023 #proteomics #metabolomics #lipidomics

PastelBio @[email protected] · 2023-07-31 · 10:40 UTC

RT @: Keep calm, Pfam is still running!But now it's hosted on the InterPro website! At #ISMBECCB2023, we had the opportunity to learn more about @PfamDB and its integration with @InterProDB website. We even won these really cool t-shirts,Thanks!

#ismbeccb2023

Chloé Azencott @[email protected] · 2023-07-27 · 15:22 UTC

Mark Gerstein at #ISMBECCB2023: Deep learning is exciting, but let's not forget about the physical and biological models underlying the science we're interested in. Let's make biomedical data science more like weather forecasting.

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 11:46 UTC

Névéol: What can we do?
Understand the stakes better.
Facilitate levers like data sharing, shared tasks, and policy.
Write more documentation, for protocols, etc.; elicit audits.

See Cohen-Boulakia et al 2017 Future Gen Comput Syst

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 11:36 UTC

Aurélie Névéol:
How can we make clinical NLP more reproducible? Can NLP also help with reproducibility? Even word or sentence tokenization can be inconsistent. Most NLP folks have, at least once, failed to repeat someone else's experiment, or even their own. Sometimes it's due to differences in preprocessing, software versions, training vs test splits, or other boring things. Availability issues, page limits, and the bias toward novelty don't help either.

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Chloé Azencott @[email protected] · 2023-07-27 · 10:35 UTC

One perk of attending #ISMBECCB2023 virtually: watching the recording of a keynote I missed instead of the talk I had planned to watch but turned out not to be interested in.

(I guess you could also plug in your headphones and do the same if you're there in person, but that's noticeably ruder.)

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 09:03 UTC

Sylwia Szymanska: Word embeddings capture functions of low complexity regions: scientific literature analysis using a transformer-based language model

Low-complexity regions in proteins are biologically important. But there isn't a database or even a list of these relationships. So let's extract them with a language model.
#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 08:51 UTC

Brett Beaulieu-Jones: Can we use large language models with clinical notes to estimate likelihood of seizure recurrence? Yes - and even with good results - but models are difficult to interpret. So can we build a model that includes things we really care about, then add an instructable layer? Yes! Use note metadata as weak supervision -> instructions for the model. A tuned T5-Flan model does really well.

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 08:39 UTC

Robert Leaman: BioNER requires multiple entity types -> relation extraction. But human-annotated NER data are scarce - < 0.01% of PubMed articles. Can use pre trained language model to do multitask NER...but instead we could modify the data. Include annotations for negative mentions by type + tokens for sentence start/end.
AIONER converts training data to this form and aggregates data sets. Moderate improvement on most BioNER types.

Repo here: https://github.com/ncbi/AIONER

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 08:18 UTC

Katerina Nastou: Benchmarking species name NER is some thing the S800 corpus was used for, but otherwise well-performing models were doing poorly on it. Problem? Annotation inconsistencies in S800. It's been manually revised using stricter rules, and just species, strain, and genera names (each with their own tags). 200 more documents too, so now it's S1000.

How do NER models do on it? F1 up around 89 to 91.

Get corpus at https://zenodo.org/record/7064902

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 07:21 UTC

Esmaeil Nourani: Health involves lifestyle factors. Can we extract relations connecting those to disease?
Developing a draft lifestyle ontology. Started with 869 concepts across multiple branches. Needed to get synonyms, too - embeddings helped with that, and also allowed discovery of new candidate terms. Full draft is now 1652 concepts. Ready for NER and RE.

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 07:01 UTC

Krallinger: Organizing shared tasks. Some processes can take years. Examples - CANTEMIST, CodiEsp, MESINESP, MEDDOCAN, MEDDOPROF, ClinSpEn, DisTEMIST. Most recently MEDDOPLACE, PharmaCoNER
#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 06:52 UTC

Krallinger: It's important to engage clinical experts from the beginning. That includes their considerations on the content sources.

Annotation guidelines are necessary. See the guides at http://zenodo.org/communities/medicalnlp
Translating these to languages beyond English helps the community.

#ismbeccb2023
#textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 06:47 UTC

Krallinger: Developing language models for clinical data in Spanish. Since clinical text varies so much in structure and content, you need a balance between general language and domain-specific optimization. Need some clear annotation guidelines too.

Really need a set of clear clinical use cases, too.
#ismbeccb2023 #textmining

#textmining #ismbeccb2023

Harry Caufield @[email protected] · 2023-07-27 · 06:46 UTC

Hi #ismbeccb2023.
I'm in Text Mining today.

Martin Krallinger: Unstructured text from clinical narratives is still underused. There are many other text sources too, like patient forums or drug leaflets, but clinical narratives are especially difficult. No out of the box NLP solution works. Need data, infrastructure, and reproducible benchmarks.

#textmining

#ismbeccb2023 #textmining

António Domingues @[email protected] · 2023-07-26 · 19:12 UTC

Oh today I saw more alternative splicing goodies at #ismbeccb2023 #ismb2023

#ismbeccb2023 #ismb2023

Bastian Greshake Tzovaras @[email protected] · 2023-07-26 · 15:58 UTC

As I’m heading back to London from #BOSC2023, I can only reflect on much this conference means to me. If @BOSC didn’t exist my professional (and personal!) life would look quite differently and I’d be a lot poorer for it.

It was a blast to meet so many old friends again and make new ones, amongst a community of folks being so dedicated to and passionate about open science/data/source and doing ethical research more broadly.

#ISMBECCB2023

#ismbeccb2023 #bosc2023

Harry Caufield @[email protected] · 2023-07-26 · 15:53 UTC

* Zachary Flamholz: Unannotated *viral* proteins. There are many of them, and annotation is usually done by homology. See the PHROGs database of phage genomes - representations of these sequences can accurately identify functional category. Also enables identifying some novel protein families.

See https://www.researchsquare.com/article/rs-2852098/v1

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 15:41 UTC

* Miguel Fernández Martín: Comparing bacterial protein interactomes to find antibiotic resistance genes. (Back In My Day, we did this with a lot of Y2H). An adaptation of ContextMirror that takes coevolutionary context into account should work. Spoiler: it does. Likely a good way to assemble experimental interactomes with better guidance.
#ismbeccb2023

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 15:32 UTC

Back to Function!
* Aysun Urhan: What to do with proteins of unknown function? A new species -> new genes. We can make protein sequence embeddings to try to infer homology, though most embedding approaches so far haven't focused on bacteria. Use what we know about operons (including predicting if they haven't been confirmed) and combine with protein embeddings. Then assign GO terms w/ cosine similarity. This does work better than using AA's alone.

See https://github.com/AbeelLab/sap

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 15:08 UTC

* Blessy Antony: What causes a virus to switch hosts? Can we computationally predict the potential host species and necessary mutations? Focusing on mammal + bird viral hosts, a transformer encoder model appears to do best in predicting host switch. It's easier with species we have more examples for. For mutation prediction, sequence embeddings suggest some potential zoonosis-related sequences.
#ismbeccb2023

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 14:56 UTC

* Xiaoyue Cui: Multidomain proteins - they contain families of domains with variable architecture. There are constraints on that architecture, though - it isn't completely random. It can be simulated (see DomArchov, http://www.cs.cmu.edu/~durand/DomArchov/). So why not use some natural language processing methods? Embeddings, yes, with Word2Vec. Can get the TF-IDF too. Works well for comparative analysis.
#ismbeccb2023

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 14:32 UTC

Back in EvolCompGen.
* Milana Frenkel-Morgenstern: A new database called PASTORAL (Protein-protein interActions of Stress-reaponse genes in subTerranean and fossORial AnimaLs). Animals in this DB are diverse but share ecological contexts. Clustering protein interactions shows they break down based on shared ecology of their source species.
#ismbeccb2023

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 14:15 UTC

In the Education track.
Stephen Piccolo: "End user" researchers may just need small scripts and tools. Basic bioinformatics courses teach that. How does ChatGPT handle an entry-level programming prompt? Can it filter data? Make plots? Out of 184 prompts, it passed 139 on the first try. It had some difficulty identifying CpG islands, writing regexes, and interpreting longer prompts.

Slides: https://docs.google.com/presentation/d/1IRMrQmyCS7t7SNbmGuUA1lmAJRvvUA7bFalPi_8LuB0/edit?usp=drivesdk

#ismbeccb2023

Harry Caufield @[email protected] · 2023-07-26 · 13:31 UTC

Salvatore Cosentino: SonicParanoid is a tool for orthology prediction over proteomes. This can otherwise be a slow process, but SonicParanoid2 speeds it up with an AdaBoost model and by ignoring pairs unlikely to be orthologous. To find domains, there's a Doc2Vec model. Working at the domain level makes it easier to find fusion proteins. Someone ran 1000 animal proteomes on this tool and it finished...eventually?

See Cosentino and Iwasaki 2023 bioRxiv https://doi.org/10.1101/2023.05.14.540736

#ismbeccb2023

AnaRojas @[email protected] · 2023-07-26 · 13:28 UTC

At #ismbeccb2023, #COSI Function track. Hey @iddux, you are missed here, mate

#cosi #ismbeccb2023

BOSC (OpenBio's Conference) @[email protected] · 2023-07-26 · 08:12 UTC

1/8 - #BOSC2023 #ISMBECCB2023 Summary

The 24th annual Bioinformatics Open Source Conference, BOSC 2023 (open-bio.org/events/bosc-2023), kicked off with a welcome from chair Nomi Harris, an overview of BOSC’s parent organization, the Open Bioinformatics Foundation, and a report about the pre-BOSC CollaborationFest (#CoFest), a collaborative work event (including but not limited to hacking) hosted by the nearby ENS de Lyon.

#bosc2023 #ismbeccb2023 #cofest

Genetics Ecology and Evolution @[email protected] · 2023-07-26 · 07:40 UTC

Interested in what we have done with many of the #bioinformatics tools you and people like you have released? #ismbeccb2023
What happen when recombination stops, the tempo of #seXYevol today at #EvolCompBiol Pasteur auditorium 11h30
#TE #sequence # genome #degeneration #molecular #dating #denovo #genes #EvolutionaryBiology
https://gitlab.com/marine.c.duhamel/microtep
https://www.biorxiv.org/content/10.1101/2022.08.03.502670v2.full #preprint version final version coming out soon
https://onlinelibrary.wiley.com/doi/full/10.1111/jeb.13991 @JEB https://academic.oup.com/mbe/article/39/4/msac060/6553583 @officialSMBE

#bioinformatics #ismbeccb2023 #sexyevol #evolcompbiol #te #sequence

AnaRojas @[email protected] · 2023-07-25 · 10:19 UTC

Here at #ismbeccb2023 at the #COSI 3DSig, enjoying this morning session!

#cosi #ismbeccb2023

PastelBio @[email protected] · 2023-07-24 · 13:40 UTC

RT @: Today at 4PM Alan Bridge of #SwissProt presents the efforts at @UniProt-KB to improve knowledge representation with this resource, using language models in #AI for expert curation, and the role of expert bicurators in the session “Knowledge & Impact from Data”. #ISMBECCB2023

#swissprot #ai #ismbeccb2023

BOSC (OpenBio's Conference) @[email protected] · 2023-07-24 · 07:42 UTC

We're super-excited for our 24th annual BOSC, #BOSC2023, starting today (July 24) as part of #ISMBECCB2023!

Don't miss our opening keynote by Sara El-Gebali (@yalahowy), after a brief "Welcome to BOSC" session including
@OpenBio and #CoFest overviews.

#bosc2023 #ismbeccb2023 #cofest