home.social

#characterencoding — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #characterencoding, aggregated by home.social.

  1. 📰 技育祭で引いた文字化けおみくじを解読してみた (👍 29)

    🇬🇧 Decoding a garbled fortune slip from Geek Festival 2026 - a fun character encoding adventure
    🇰🇷 기술 축제에서 받은 깨진 문자의 오미쿠지를 해독한 재미있는 도전기

    🔗 zenn.dev/toramutton/articles/g

    #CharacterEncoding #Debugging #Tech #Zenn

  2. 📰 技育祭で引いた文字化けおみくじを解読してみた (👍 29)

    🇬🇧 Decoding a garbled fortune slip from Geek Festival 2026 - a fun character encoding adventure
    🇰🇷 기술 축제에서 받은 깨진 문자의 오미쿠지를 해독한 재미있는 도전기

    🔗 zenn.dev/toramutton/articles/g

    #CharacterEncoding #Debugging #Tech #Zenn

  3. 📰 技育祭で引いた文字化けおみくじを解読してみた (👍 29)

    🇬🇧 Decoding a garbled fortune slip from Geek Festival 2026 - a fun character encoding adventure
    🇰🇷 기술 축제에서 받은 깨진 문자의 오미쿠지를 해독한 재미있는 도전기

    🔗 zenn.dev/toramutton/articles/g

    #CharacterEncoding #Debugging #Tech #Zenn

  4. I'm on a Mac, where all filesystems are UTF-8. I want to clone a repo which has ISO-8859 filenames which are not valid UTF-8 - github.com/IanDarwin/OpenLookC. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

    I worked around it by creating a dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

  5. I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - github.com/IanDarwin/OpenLookC. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

    I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

    #CharacterEncoding

  6. I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - github.com/IanDarwin/OpenLookC. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

    I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

    #CharacterEncoding

  7. I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - github.com/IanDarwin/OpenLookC. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

    I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

    #CharacterEncoding

  8. I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - github.com/IanDarwin/OpenLookC. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

    I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

    #CharacterEncoding

  9. Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: doi.org/10.60693/p46s-8j72

    #multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

  10. Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: doi.org/10.60693/p46s-8j72

    #multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

  11. Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: doi.org/10.60693/p46s-8j72

    #multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

  12. Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: doi.org/10.60693/p46s-8j72

    #multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

  13. Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: doi.org/10.60693/p46s-8j72

    #multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

  14. «Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

    This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

    tbray.org/ongoing/When/202x/20 by @timbray

    #programming #CharacterEncoding #LML

  15. «Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

    This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

    tbray.org/ongoing/When/202x/20 by @timbray

    #programming #CharacterEncoding #LML

  16. «Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

    This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

    tbray.org/ongoing/When/202x/20 by @timbray

    #programming #CharacterEncoding #LML

  17. «Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

    This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

    tbray.org/ongoing/When/202x/20 by @timbray

    #programming #CharacterEncoding #LML

  18. «Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

    This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

    tbray.org/ongoing/When/202x/20 by @timbray

    #programming #CharacterEncoding #LML

  19. Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

    This is version 2.0 of the standard:

    github.com/Interlisp/medley/bl

  20. Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

    This is version 2.0 of the standard:

    github.com/Interlisp/medley/bl

    #CharacterEncoding #xerox #retrocomputing

  21. Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

    This is version 2.0 of the standard:

    github.com/Interlisp/medley/bl

    #CharacterEncoding #xerox #retrocomputing

  22. Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

    This is version 2.0 of the standard:

    github.com/Interlisp/medley/bl

    #CharacterEncoding #xerox #retrocomputing

  23. Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

    This is version 2.0 of the standard:

    github.com/Interlisp/medley/bl

    #CharacterEncoding #xerox #retrocomputing

  24. I really love @dylanbeattie's talks.

    I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

    Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

    youtu.be/gd5uJ7Nlvvo

  25. I really love @dylanbeattie's talks.

    I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

    Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

    youtu.be/gd5uJ7Nlvvo

    #UTF #PlainText #CharacterEncoding #PikeMatchbox

  26. I really love @dylanbeattie's talks.

    I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

    Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

    youtu.be/gd5uJ7Nlvvo

    #UTF #PlainText #CharacterEncoding #PikeMatchbox

  27. I really love @dylanbeattie's talks.

    I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

    Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

    youtu.be/gd5uJ7Nlvvo

    #UTF #PlainText #CharacterEncoding #PikeMatchbox

  28. If you have been spared hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

  29. If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

  30. If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

  31. If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

  32. If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

  33. ...
    While part of me wants to find out why this odd character encoding situation crashes , another part of knows that stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

    The oddest mystery is that someone managed to get a string without a COMMENT type container into a meta data block, that's impressive, you really have to try to do that!

  34. ...
    While part of me wants to find out why this odd character encoding situation crashes #perl, another part of knows that #characterencoding stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

    The oddest mystery is that someone managed to get a string without a COMMENT type container into a #vorbis meta data block, that's impressive, you really have to try to do that!

  35. ...
    While part of me wants to find out why this odd character encoding situation crashes #perl, another part of knows that #characterencoding stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

    The oddest mystery is that someone managed to get a string without a COMMENT type container into a #vorbis meta data block, that's impressive, you really have to try to do that!

  36. ...
    While part of me wants to find out why this odd character encoding situation crashes #perl, another part of knows that #characterencoding stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

    The oddest mystery is that someone managed to get a string without a COMMENT type container into a #vorbis meta data block, that's impressive, you really have to try to do that!

  37. ...
    While part of me wants to find out why this odd character encoding situation crashes #perl, another part of knows that #characterencoding stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

    The oddest mystery is that someone managed to get a string without a COMMENT type container into a #vorbis meta data block, that's impressive, you really have to try to do that!