#utf — Public Fediverse posts on home.social

Terence Eden @[email protected] · 2025-12-30 · 12:35 UTC

🆕 blog! “A small collection of text-only websites”

A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

Here's this post in plain text - https://shkspr.mobi/blog/2025/12/a-small-collection-of-text-only-websites.txt

Obviously a webpage…

#blogging #blogs #text #unicode #utf

Tenacity Audio Editor @[email protected] · 2025-12-28 · 06:41 UTC

Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

- Mojibake fixes for #UTF-16 (no BOM) encoded fields.
- Some code cleanups, including warning fixes.
- Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

#libid3tag #utf #cmake #doxygen

रञ्जित (Ranjit Mathew) @[email protected] · 2025-10-24 · 13:47 UTC

“The Best – But Not Good – Way To Limit String Length”, Adam Pritchard (https://adam-p.ca/blog/2025/04/string-length/).

On HN: https://news.ycombinator.com/item?id=43850398

#Programming #PLDI #Strings #Length #Unicode #Characters #Bytes #Graphemes #UTF #CodePoints #I18N #Internationalization

#programming #pldi #strings #length #unicode #characters

रञ्जित (Ranjit Mathew) @[email protected] · 2025-10-24 · 13:47 UTC

“The Best – But Not Good – Way To Limit String Length”, Adam Pritchard (https://adam-p.ca/blog/2025/04/string-length/).

On HN: https://news.ycombinator.com/item?id=43850398

#Programming #PLDI #Strings #Length #Unicode #Characters #Bytes #Graphemes #UTF #CodePoints #I18N #Internationalization

#programming #pldi #strings #length #unicode #characters

रञ्जित (Ranjit Mathew) @[email protected] · 2025-10-24 · 13:47 UTC

“The Best – But Not Good – Way To Limit String Length”, Adam Pritchard (https://adam-p.ca/blog/2025/04/string-length/).

On HN: https://news.ycombinator.com/item?id=43850398

#Programming #PLDI #Strings #Length #Unicode #Characters #Bytes #Graphemes #UTF #CodePoints #I18N #Internationalization

#programming #pldi #strings #length #unicode #characters

रञ्जित (Ranjit Mathew) @[email protected] · 2025-10-24 · 13:47 UTC

“The Best – But Not Good – Way To Limit String Length”, Adam Pritchard (https://adam-p.ca/blog/2025/04/string-length/).

On HN: https://news.ycombinator.com/item?id=43850398

#Programming #PLDI #Strings #Length #Unicode #Characters #Bytes #Graphemes #UTF #CodePoints #I18N #Internationalization

#internationalization #i18n #codepoints #utf #graphemes #bytes

रञ्जित (Ranjit Mathew) @[email protected] · 2025-10-24 · 13:47 UTC

“The Best – But Not Good – Way To Limit String Length”, Adam Pritchard (https://adam-p.ca/blog/2025/04/string-length/).

On HN: https://news.ycombinator.com/item?id=43850398

#Programming #PLDI #Strings #Length #Unicode #Characters #Bytes #Graphemes #UTF #CodePoints #I18N #Internationalization

#length #unicode #characters #bytes #graphemes #utf

Pyrzout :vm: @[email protected] · 2025-09-15 · 07:10 UTC

UTF-8 Is Beautiful https://hackaday.com/2025/09/14/utf-8-is-beautiful/ #SoftwareHacks #characterset #UTF-8

#softwarehacks #characterset #utf

IT News @[email protected] · 2025-09-15 · 05:35 UTC

UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - https://hackaday.com/2025/09/14/utf-8-is-beautiful/ #softwarehacks #characterset #utf-8

#softwarehacks #characterset #utf

रञ्जित (Ranjit Mathew) @[email protected] · 2025-08-24 · 04:02 UTC

Every time I look at #Unicode gotchas, I 😰:

“RFC 9839 And Bad Unicode”, Tim Bray (https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839).

On HN: https://news.ycombinator.com/item?id=44995640

On Lobsters: https://lobste.rs/s/qrs9w8/rfc_9839_bad_unicode

#UTF #Encoding #Text #RFC #RFC9839 #Validation #ErrorHandling #UTF8

#unicode #utf #encoding #text #rfc #rfc9839

Doktor Overcomma :vepi: @[email protected] · 2025-06-26 · 19:46 UTC

Very cool, copy-paste UTF text from, e.g., Wikipedia, get Unicode.
Sanskrit अश्विन्
can be in your HTML as
अशिवन्
https://r12a.github.io/app-conversion/
#UTF #Unicode #conversion

#x0905 #x0936 #x093f #x0935 #x0928 #x094d

Joaquim Homrighausen @[email protected] · 2025-01-31 · 10:24 UTC

Why does this PHP construct:

normalizer_normalize( $search_string, \Normalizer::FORM_D );

Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

#programming #php #wtf #utf #utf8

SWEETGOOD @[email protected] · 2024-12-23 · 18:48 UTC

Diese Jahr ging die Weihnachtsspende von @sweetgood an den Umwelttreuhand-Fonds (UTF) (https://umwelt-treuhandfonds.de/). Dieser finanziert die Anwält:innen von Klimaaktivist:innen, die aktuell massiven Repressionen ausgesetzt sind.

Weitere 50€ gingen an den KUEÖ e.V., also direkt an die @AufstandLastGen

#SWEETGOOD #andersGOOD #LetzteGeneration #Klimaschutz #Schutz #UTF #Spende #spenden

#sweetgood #andersgood #letztegeneration #klimaschutz #schutz #utf

Antoine - Software therapist @[email protected] · 2024-11-28 · 00:33 UTC

Just lost 3 hours to the charset encoding inferno: my source code is in UTF-8 but the library I use assume 1 byte per char.
Add to that, some font have only a subset of char.
You get a nice mix of UTF-8 char that may render nicely and or not (depending if the first byte is a char present in the font).

"Sometimes I wonder what's worse between charset encoding and timezones." says the guy who makes clocks and displays...

#UTF-8 #ISO-8859 #ASCII #Hell

#utf #iso #ascii #hell

Lutz Hühnken @[email protected] · 2024-11-25 · 19:48 UTC

So my former colleague @jstepien is a brillant engineer / speaker / teacher, but the thing he'll be internet famous for is how websites can't handle his name 🤷‍♂️. https://wtf-8.stępień.com is really funny, though.

#encoding #utf #fail

usd AG @[email protected] · 2024-09-13 · 12:57 UTC

Did you know that apparently completely different strings are interpreted as identical by some tools?

This is due to redundant UTF-8 encodings of the same Unicode characters.

Read more below 🧵

#InfoSec #CyberSecurity #Hacking #Pentesting #UTF #Unicode

#infosec #cybersecurity #hacking #pentesting #utf #unicode

SpaceLifeForm @[email protected] · 2024-09-08 · 22:27 UTC

@codinghorror

Imagine hiding hidden messages in alleged whitespace.

https://jkorpela.fi/chars/spaces.html

#Unicode #UTF-8 #coding

#unicode #utf #coding

kernpanik 🐾 🕊️ ☮ 🖖 @[email protected] · 2024-08-19 · 07:18 UTC

@Silberwoelfin Na ja, um fair zu sein: #UTF gibt es gerade erst seit 22 Jahren - so schnell ist das halt nicht überall implementiert.

*wegduck*

#utf

Inautilo @[email protected] · 2024-08-02 · 00:05 UTC

#Development #Introductions
An introduction to character encoding · Unicode and UTF encoding/decoding explained https://ilo.im/15zmvc

_____
#Character #Encoding #Unicode #ASCII #UTF #UTF8 #JavaScript #WebDev #Frontend #Backend

#development #introductions #character #encoding #unicode #ascii

katzenberger 🇺🇦 @[email protected] · 2024-07-25 · 15:36 UTC

Die #LetzteGeneration @AufstandLastGen wird in puncto #Rechtskosten vom Umwelt-Treuhandfonds (#UTF) unterstützt. Wer den Repressionen gegen die Aktivist*innen etwas entgegensetzen möchte, kann das hier besonders schmerzlindernd tun.

»Der Umwelt-Treuhandfonds (#UTF) wurde 2021 gegründet, um Klima- und Umweltaktivist*innen in juristischen Angelegenheiten finanziell zu unterstützen. Strafverfahren, Präventivgewahrsam oder Demonstrationsverbote – die Aktivist*innen nehmen durch ihren vielfältigen Protest persönliche und juristische Konsequenzen auf sich. Der Umwelt-Treuhandfonds stellt sicher, dass die rechtsstaatlich verankerten Rechte der Aktivist*innen im Verfahren gewahrt und die Konsequenzen ihres Handelns durch eine kompetente juristische Vertretung minimiert werden.«

https://umwelt-treuhandfonds.de/spenden/

#letztegeneration #rechtskosten #utf

aww-yawn @[email protected] · 2024-07-18 · 18:08 UTC

LibreOffice writer is rendering the character correctly :neofox_woozy:

#weather #icon #symbol #LibreOffice #LibreOfficeWriter #typography #utf #utf16 #utf32

#utf16 #utf #utf32 #typography #libreofficewriter #libreoffice

Jezus Michał "Le Wzdych" (on) @[email protected] · 2024-07-10 · 07:07 UTC

Kolejny ciekawy problem z dziedziny przenośności: kodowania #UTF-16, UTF-32, UCS-2 i UCS-4 są zależne od kolejności bajtów. Oznacza to, że można je zakodować albo jako big endian, albo jako little endian. Kodując ciągi znaków, #Python używa kolejności bajtów systemu i dopisuje Byte Order Marker na początku pliku. Przy dekodowaniu, automatycznie odczytuje zapisany wcześniej BOM, by określić właściwą kolejność bajtów, dzięki czemu wszystko "po prostu działa".

Problemy zaczynają się, kiedy próbujemy porównać zakodowane dane na poziomie bajtów, np. porównując zapisany wcześniej jako UTF-16 plik z wynikiem wywołania `encode()`. Jeżeli plik był zapisany na systemie little endian (jak to zwykle bywa), a testy uruchamiane są na systemie big endian, nagle okaże się, że dostajemy dwa różne ciągi bajtów!

"Oczywistym" rozwiązaniem jest wymuszenie konkretnej kolejności bajtów, np. użyjąc kodowania `utf-16-le` zamiast `utf-16`. Tu jednak pojawia się kolejny problem — kiedy podajemy określoną kolejność bajtów, Python nie zapisuje już BOM — tak więc porównanie na poziomie bajtów wykaże różnicę w postaci brakującego BOM. Można to jednak rozwiązać prostą sztuczką — dopisując BOM (`\ufeff`) na początku kodowanego ciągu.

https://github.com/python/importlib_resources/pull/313/files

#przenośność #unikod #Gentoo

#utf #python #przenosnosc #unikod #gentoo

Jesus Michał "Le Sigh" 🏔 (he) @[email protected] · 2024-07-10 · 07:02 UTC

Another curious #portability pitfall: #UTF-16, UTF-32, UCS-2 and UCS-4 encoding are byte order dependent. That is, they can either be encoded as big endian or little endian. #Python uses the host byte order when encoding, and writes a Byte Order Marker at the beginning of the file. When decoding, it transparently reads the BOM back to determine the encoding, so everything works fine out of the box.

Problems start happening when you start comparing the exact byte-level output, e.g. by comparing a UTF-16 bytes read from a file with the result of `encode()`. If the file was written on a little endian system (which is commonly the case), and the test is running on a big endian system, you're suddenly going to get different strings!

The "obvious" way to solve this is to force a specific endianness, e.g. use `utf-16-le` rather than plain `utf-16`. However, when you force endianness, BOM is no longer used — so the byte-level data mismatches on the missing BOM now. The trick is, to add the BOM (`\ufeff`) straight into the #unicode string.

https://github.com/python/importlib_resources/pull/313/files

#Gentoo

#portability #utf #python #unicode #gentoo

Kornel @[email protected] · 2023-11-12 · 16:19 UTC

Chinese/Japanese/Korean characters take more bytes in #UTF-8 encoding than Latin letters. This seems unfair. However, CJK characters represent whole words or syllables, so CJK text in UTF-8 can still take fewer bytes than its English equivalent.

https://hsivonen.fi/string-length/#:~:text=UTF%2D8%20in%20unfair%20to%20CJK

#utf

Todd A. Jacobs | Pragmatic Cybersecurity @[email protected] · 2023-09-07 · 16:00 UTC

@nirvdrum @postmodern But they don't support #UTF-8 character property groups, which can be important if you can't rely on input always being ASCII. The name "Björn" might be an example where this could matter.

#utf

IT News @[email protected] · 2023-09-07 · 11:15 UTC

Building Up Unicode Characters One Bit at a Time - The range of characters that can be represented by Unicode is truly bewildering. I... - https://hackaday.com/2023/09/07/building-up-unicode-characters-one-bit-at-a-time/ #peripheralshacks #truetypefont #codepoint #keyboard #unicode #binary #usbhid #glyph #utf-8

#utf #glyph #usbhid #binary #unicode #keyboard