home.social

#utf — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #utf, aggregated by home.social.

  1. 🆕 blog! “A small collection of text-only websites”

    A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

    Here's this post in plain text - shkspr.mobi/blog/2025/12/a-sma

    Obviously a webpage…

    👀 Read more: shkspr.mobi/blog/2025/12/a-sma

    #blogging #blogs #text #unicode #utf-8

  2. 🆕 blog! “A small collection of text-only websites”

    A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

    Here's this post in plain text - shkspr.mobi/blog/2025/12/a-sma

    Obviously a webpage…

    👀 Read more: shkspr.mobi/blog/2025/12/a-sma

    #blogging #blogs #text #unicode #utf-8

  3. 🆕 blog! “A small collection of text-only websites”

    A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

    Here's this post in plain text - shkspr.mobi/blog/2025/12/a-sma

    Obviously a webpage…

    👀 Read more: shkspr.mobi/blog/2025/12/a-sma

    #blogging #blogs #text #unicode #utf-8

  4. 🆕 blog! “A small collection of text-only websites”

    A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

    Here's this post in plain text - shkspr.mobi/blog/2025/12/a-sma

    Obviously a webpage…

    👀 Read more: shkspr.mobi/blog/2025/12/a-sma

    #blogging #blogs #text #unicode #utf-8

  5. 🆕 blog! “A small collection of text-only websites”

    A couple of years ago, I started serving my blog posts as plain text. Add .txt to the end of any URl and get a deliciously lo-fi, UTF-8, mono[chrome|space] alternative.

    Here's this post in plain text - shkspr.mobi/blog/2025/12/a-sma

    Obviously a webpage…

    👀 Read more: shkspr.mobi/blog/2025/12/a-sma

    #blogging #blogs #text #unicode #utf-8

  6. Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

    - Mojibake fixes for #UTF-16 (no BOM) encoded fields.
    - Some code cleanups, including warning fixes.
    - Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

    Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

  7. Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

    - Mojibake fixes for #UTF-16 (no BOM) encoded fields.
    - Some code cleanups, including warning fixes.
    - Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

    Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

  8. Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

    - Mojibake fixes for #UTF-16 (no BOM) encoded fields.
    - Some code cleanups, including warning fixes.
    - Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

    Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

  9. Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

    - Mojibake fixes for #UTF-16 (no BOM) encoded fields.
    - Some code cleanups, including warning fixes.
    - Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

    Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

  10. Recently, we talked about #libid3tag and our intent to make a new release. So far, we have a preview of some changes that have already been made in the latest main:

    - Mojibake fixes for #UTF-16 (no BOM) encoded fields.
    - Some code cleanups, including warning fixes.
    - Compatibility with #CMake > 4.0 (we now require CMake 3.10+)

    Meanwhile, we are also working on #Doxygen documentation to better document the library too, so quite a few things are going on for libid3tag right now.

  11. #メモ #文字コード #ANSI #JIS #Shift_JIS
    #UTF-8 で書かれたテキストファイルをShift_JIS に変えたいのだけど、Notepad++でどのようにしたら良いのか分からなかった。「エンコード」の「文字セット」で「日本語 > Shift-JIS」を選ぶと日本語が文字化けしてしまう。
    試しに「エンコード」の「文字セット」で「ANSI に変換」を選んだら文字化けせず、右下の文字コード表示も「ANSI」に変わった。もしかして…と検索して、今日、初めて知った。

    #Windows のbatファイルをNotePad++で作成する際、最初はどうしてもデフォルトのUTF-8で保存してしまって、実行すると文字化けしてて、Shift_JIS で保存しなければいけなかったんだ…と直そうとしても方法が分からなくて、TeraPadの「文字/改行コード指定保存」を使っていたのだけど、NotePad++で「ANSI に変換」の後に保存すれば良かったのだな…と。

  12. #メモ #文字コード #ANSI #JIS #Shift_JIS
    #UTF-8 で書かれたテキストファイルをShift_JIS に変えたいのだけど、Notepad++でどのようにしたら良いのか分からなかった。「エンコード」の「文字セット」で「日本語 > Shift-JIS」を選ぶと日本語が文字化けしてしまう。
    試しに「エンコード」の「文字セット」で「ANSI に変換」を選んだら文字化けせず、右下の文字コード表示も「ANSI」に変わった。もしかして…と検索して、今日、初めて知った。

    #Windows のbatファイルをNotePad++で作成する際、最初はどうしてもデフォルトのUTF-8で保存してしまって、実行すると文字化けしてて、Shift_JIS で保存しなければいけなかったんだ…と直そうとしても方法が分からなくて、TeraPadの「文字/改行コード指定保存」を使っていたのだけど、NotePad++で「ANSI に変換」の後に保存すれば良かったのだな…と。

  13. UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - hackaday.com/2025/09/14/utf-8- #softwarehacks #characterset #utf-8

  14. UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - hackaday.com/2025/09/14/utf-8- #softwarehacks #characterset #utf-8

  15. UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - hackaday.com/2025/09/14/utf-8- #softwarehacks #characterset #utf-8

  16. UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - hackaday.com/2025/09/14/utf-8- #softwarehacks #characterset #utf-8

  17. UTF-8 Is Beautiful - It’s likely that many Hackaday readers will be aware of UTF-8, the mechanism for i... - hackaday.com/2025/09/14/utf-8- #softwarehacks #characterset #utf-8

  18. Very cool, copy-paste UTF text from, e.g., Wikipedia, get Unicode.
    Sanskrit अश्विन्
    can be in your HTML as
    अशिवन्
    r12a.github.io/app-conversion/
    #UTF #Unicode #conversion

  19. Very cool, copy-paste UTF text from, e.g., Wikipedia, get Unicode.
    Sanskrit अश्विन्
    can be in your HTML as
    अशिवन्
    r12a.github.io/app-conversion/
    #UTF #Unicode #conversion

  20. Diese Jahr ging die Weihnachtsspende von @sweetgood an den Umwelttreuhand-Fonds (UTF) (umwelt-treuhandfonds.de/). Dieser finanziert die Anwält:innen von Klimaaktivist:innen, die aktuell massiven Repressionen ausgesetzt sind.

    Weitere 50€ gingen an den KUEÖ e.V., also direkt an die @AufstandLastGen

    #SWEETGOOD #andersGOOD #LetzteGeneration #Klimaschutz #Schutz #UTF #Spende #spenden

  21. Why does this PHP construct:

    normalizer_normalize( $search_string, \Normalizer::FORM_D );

    Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

    #programming #php #wtf #utf #utf8

  22. Just lost 3 hours to the charset encoding inferno: my source code is in UTF-8 but the library I use assume 1 byte per char.
    Add to that, some font have only a subset of char.
    You get a nice mix of UTF-8 char that may render nicely and or not (depending if the first byte is a char present in the font).

    "Sometimes I wonder what's worse between charset encoding and timezones." says the guy who makes clocks and displays...

    #UTF-8 #ISO-8859 #ASCII #Hell

  23. So my former colleague @jstepien is a brillant engineer / speaker / teacher, but the thing he'll be internet famous for is how websites can't handle his name 🤷‍♂️. wtf-8.stępień.com is really funny, though.

    #encoding #utf #fail

  24. Did you know that apparently completely different strings are interpreted as identical by some tools?

    This is due to redundant UTF-8 encodings of the same Unicode characters.

    Read more below 🧵

    #InfoSec #CyberSecurity #Hacking #Pentesting #UTF #Unicode

  25. @Silberwoelfin Na ja, um fair zu sein: #UTF gibt es gerade erst seit 22 Jahren - so schnell ist das halt nicht überall implementiert.

    *wegduck*

  26. Die #LetzteGeneration @AufstandLastGen wird in puncto #Rechtskosten vom Umwelt-Treuhandfonds (#UTF) unterstützt. Wer den Repressionen gegen die Aktivist*innen etwas entgegensetzen möchte, kann das hier besonders schmerzlindernd tun.

    »Der Umwelt-Treuhandfonds (#UTF) wurde 2021 gegründet, um Klima- und Umweltaktivist*innen in juristischen Angelegenheiten finanziell zu unterstützen. Strafverfahren, Präventivgewahrsam oder Demonstrationsverbote – die Aktivist*innen nehmen durch ihren vielfältigen Protest persönliche und juristische Konsequenzen auf sich. Der Umwelt-Treuhandfonds stellt sicher, dass die rechtsstaatlich verankerten Rechte der Aktivist*innen im Verfahren gewahrt und die Konsequenzen ihres Handelns durch eine kompetente juristische Vertretung minimiert werden.«

    umwelt-treuhandfonds.de/spende

  27. Kolejny ciekawy problem z dziedziny przenośności: kodowania #UTF-16, UTF-32, UCS-2 i UCS-4 są zależne od kolejności bajtów. Oznacza to, że można je zakodować albo jako big endian, albo jako little endian. Kodując ciągi znaków, #Python używa kolejności bajtów systemu i dopisuje Byte Order Marker na początku pliku. Przy dekodowaniu, automatycznie odczytuje zapisany wcześniej BOM, by określić właściwą kolejność bajtów, dzięki czemu wszystko "po prostu działa".

    Problemy zaczynają się, kiedy próbujemy porównać zakodowane dane na poziomie bajtów, np. porównując zapisany wcześniej jako UTF-16 plik z wynikiem wywołania `encode()`. Jeżeli plik był zapisany na systemie little endian (jak to zwykle bywa), a testy uruchamiane są na systemie big endian, nagle okaże się, że dostajemy dwa różne ciągi bajtów!

    "Oczywistym" rozwiązaniem jest wymuszenie konkretnej kolejności bajtów, np. użyjąc kodowania `utf-16-le` zamiast `utf-16`. Tu jednak pojawia się kolejny problem — kiedy podajemy określoną kolejność bajtów, Python nie zapisuje już BOM — tak więc porównanie na poziomie bajtów wykaże różnicę w postaci brakującego BOM. Można to jednak rozwiązać prostą sztuczką — dopisując BOM (`\ufeff`) na początku kodowanego ciągu.

    github.com/python/importlib_re

    #przenośność #unikod #Gentoo

  28. Another curious #portability pitfall: #UTF-16, UTF-32, UCS-2 and UCS-4 encoding are byte order dependent. That is, they can either be encoded as big endian or little endian. #Python uses the host byte order when encoding, and writes a Byte Order Marker at the beginning of the file. When decoding, it transparently reads the BOM back to determine the encoding, so everything works fine out of the box.

    Problems start happening when you start comparing the exact byte-level output, e.g. by comparing a UTF-16 bytes read from a file with the result of `encode()`. If the file was written on a little endian system (which is commonly the case), and the test is running on a big endian system, you're suddenly going to get different strings!

    The "obvious" way to solve this is to force a specific endianness, e.g. use `utf-16-le` rather than plain `utf-16`. However, when you force endianness, BOM is no longer used — so the byte-level data mismatches on the missing BOM now. The trick is, to add the BOM (`\ufeff`) straight into the #unicode string.

    github.com/python/importlib_re

    #Gentoo

  29. Chinese/Japanese/Korean characters take more bytes in #UTF-8 encoding than Latin letters. This seems unfair. However, CJK characters represent whole words or syllables, so CJK text in UTF-8 can still take fewer bytes than its English equivalent.

    hsivonen.fi/string-length/#:~:

  30. @nirvdrum @postmodern But they don't support #UTF-8 character property groups, which can be important if you can't rely on input always being ASCII. The name "Björn" might be an example where this could matter.

  31. Why does this PHP construct:

    normalizer_normalize( $search_string, \Normalizer::FORM_D );

    Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

    #programming #php #wtf #utf #utf8

  32. Why does this PHP construct:

    normalizer_normalize( $search_string, \Normalizer::FORM_D );

    Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

    #programming #php #wtf #utf #utf8

  33. Why does this PHP construct:

    normalizer_normalize( $search_string, \Normalizer::FORM_D );

    Convert ÖÖÖ to OOO, but keeps ÅÅÅ as ÅÅÅ ... WTF?! 🤔

    #programming #php #wtf #utf #utf8