#mediadescription — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #mediadescription, aggregated by home.social.
-
CW: Elaborate media description for video in OP
A video showing off a bunch of wooden kinetic contraptions:
- a shrimp riding a unicycle on a toilet roll as it gets unrolled
- a swing band consisting of shrimps playing guitar, keytar, trumpet and saxophone, as they swing, dance, or spin around.
- a hand cranking the mechanism; you can see the wooden tumbles go up and down as it spins. A lid, with pyramids on top, above the mechanism has the text "what is hidden under the pyramids?". As another hand lifts up the lid, a blue, green and grey alien are revealed, playing some kind of brass wind instrument, turntables, and a keytar.
- a frog with top hat, riding a galloping snail
- a skeleton dancing
- a skeleton with red hearts for eyes, dropping its jaw
- a shrimp behind a wooden table with a cutting board, chopping orange, grey, and grey foods.
- 3 beans with faces and limbs, dancing on top of a can of beans; one has a boombox, while another spins on its head.
- a wooden kitchen table, full with food, with mice swinging and dancing around with the food.
- 3 pumpkins; one playing a drum, another a harmonica, and the third a saxophone
- a cat with a blue wizard har, stirring on a cauldron with a green brew.
- a dog riding a Small Red Car
- A frog with a top hat riding a unicycle on top of a roll of toilet paper being unrolled in its holder.
- a shrimp in a suit, riding a skateboard over a threadmill, hopping over shells, fish, and other things.
- a beaver and a a hedgehog, clapping their paws.
- a bunch of pasta dancing at a rave: rave-ioli. There's a purple turntable and big speakers.
-
@Chris Another good question is what the Fediverse (Mastodon specifically) expects in an alt-text for a video. A summary which should probably go outside the video? Or a visual description of what's shown in the video, just like an alt-text for an image, but for moving and constantly changing visuals and maybe even time-coded?
#AltText #AltTextMeta #CWAltTextMeta #VideoDescription #VideoDescriptions #MediaDescription #MediaDescriptions -
@iFixitand it doesn't look like you can attach documents to posts
You can't on Mastodon. I could, both here on Hubzilla and on (streams) where I post my images.
But I wouldn't have to. Vanilla Mastodon has a character limit of 500. Hubzilla has a character "limit" that's so staggeringly high that nobody knows how high it is because it doesn't matter. (streams), from the same creator and the same software family as Hubzilla, has a character "limit" of over 24,000,000 which is not an arbitrary design decision but simply the size of the database field.
By the way: Both are in the Fediverse, and both are federated with Mastodon, so Mastodon's "all media must have accurate and sufficiently detailed descriptions" rule applies there as well unless you don't care if thousands upon thousands of Mastodon users block you for not supplying image and media descriptions.
In theory, I could publish a video of ten minutes, and in the same post, I could add a full, timestamped description that takes several hours to read. Verbatim transcript of all spoken words. Detailed description of the visuals where "detailed" means "as detailed as Mastodon loves its alt-texts" as in "800 characters of alt-text or more for a close-up of a single flower in front of a blurry background" detailed. Detailed description of all camera movements and cuts. Description of non-spoken-word noises. All timestamped, probably with over a hundred timestamps for the whole description of ten minutes of video.
Now I'm wondering if that could be helpful or actually required, or if it's overkill and actually a hindrance.
CC: @masukomi @GunChleoc
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #Mastodon #Hubzilla #Streams #(streams) #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #MediaDescription #MediaDescriptions -
@masukomi @iFixit And this is only mostly a transcript of the spoken words.
What if someone actually took upon themselves the effort to describe a video with a timestamped/timecoded combination of visual description, spoken word transcript and non-spoken word audio description? Especially if the visual description is on the same high level of detail that's expected in the Fediverse?
CC: @GunChleoc
#FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions -
@Christopher M0YNG Just out of curiosity: What would be an appropriate textual description for a video?
A description of what the video is about?
Or a detailed, time-coded description of the actual visuals throughout the whole video plus a detailed, time-coded transcript of the audio in the video?
If the latter, what details are required, regardless of topic and content?
CC: @Stefan Bohacek
#VideoDescription #VideoDescriptions #MediaDescription #MediaDescriptions -
CW: Not sure if adding this channel to Trunk would be a good idea; CW: long (over 6,000 characters), Fediverse meta, Fediverse beyond Mastodon meta, alt-text meta, image description meta
I'm thinking about adding at least one of my channels to Trunk. I mean, it isn't like I don't have enough followers; they've risen above 500 again. But Trunk would help people follow me for a better reason than just one cool post or comment, all still without having to figure out how to check my profile.
That said, Trunk requires you to volunteer on at least one list, in at least one topic. That's where things get difficult.
For one, there's Described Media. I'm not even kidding: It's a list for people who describe the media which they post. People who add alt-text to their images. Even though everybody in the Fediverse is expected to do it all the time, at least if their posts reach Mastodon in some way.
I do it. But I don't do it "the standard Mastodon way". For one, Mastodon's limitations, especially the 500-character limit for posts, don't apply to me. I don't have any character limit in my posts. Thus, nothing forces me to describe and even explain an image only in alt-text because I've got plenty of space in my posts.
Besides, my images require absolutely massive image descriptions, especially taking all those typical image description guidelines into consideration. That's because none of them are prepared for the edge-cases that are my images. And with "absolutely massive", I don't mean, "800 characters? Are you nuts?! Who's gonna read that?!?" I mean up to over 60,000 characters, and I can guarantee you this is not a typo. Maybe even more in the future.
I'm not quite convinced that I'm a good example of a provider of media descriptions, partly because by adhering to general image description rules, I break most of Mastodon's image description rules, partly because next to nobody has the patience to read one image description that's longer than 120 toots or have it read to them by a screen reader, partly also because my own image descriptions become obsolete so quickly whenever I discover something new that I should do in image descriptions.
Even if none of this mattered, I don't post images often. Maybe once every couple months. That's because I have to schedule my image posts due to how much time they consume. The 60,000-character description took me two full days to research and write, breakfast to after dinner. And it might become even rarer in the future. I've started a dedicated (streams) channel to be able to post images with sensitive content, including but not limited to eyes and faces. But posting these will eat up the time I could also use to post perfectly safe images on this Hubzilla channel.
The Described Media list is rather for people who routinely whip up 200 characters of alt-text in under a minute or so, but who do so at least daily.
An even more obvious list, at least at first glance, would be 3D Virtual & Augmented Reality, seeing as the primary topic of this channel is OpenSim. In fact, in the long run, I could add two or three channels to this list.
But OpenSim does not fit on it. The list is for actual virtual reality, for new virtual reality and augmented reality developments of the 2020s. "The Metaverse" as envisioned by most. It absolutely requires VR or AR headsets, full stop.
OpenSim has been using the term "metaverse" routinely since as early as 2007, the year of its inception. But the list is not about "metaverse". It's about VR.
And OpenSim is what's commonly called a "pancake". It's made for desktop and laptop computers and their 2-D screens. It does not really work on VR headsets. It does not work on stand-alone VR headsets with integrated graphics hardware at all. That's mainly because VR headsets require a constantly guaranteed frame rate of 60fps. It isn't simplified and cartoonish and geared towards mobile graphics hardware like Horizons or Rec Room or the like. Instead, it's largely photo-realistic, high-detail stuff with high-resolution textures.
You may get 60fps out of a dedicated graphics unit on a not-too-highly-detailed sim when you're alone. But have more than a few avatars around, and your fps will drop below 60. Join a party or any other event with a couple dozen avatars, and you're heading for slideshow-level fps. That's because the avatars aren't made by the OpenSim devs and optimised for high performance. They mostly entirely consist of user-supplied stuff and optimised for good looks. Some two years ago, one average avatar had more vertices than an entire scene in World of Warcraft. They've only gotten much, much more complex since then.
A liquid-cooled 4090Ti overclocked to kingdom come won't give you 60fps at 1080p at OSgrid's Event Plaza on a Friday night. So, what chances does a stand-alone, passively-cooled headset based on phone hardware have if it has to whip up even more pixels? And none of this is even taking recently-introduced Physically-Based Rendering into account which absolutely requires dedicated graphics hardware with no less than 4GB of dedicated VRAM, preferably at least 8GB.
That is, you couldn't use OpenSim on a stand-alone headset anyway. There are only two OpenSim-compatible viewers available right now, they're only available for desktop operating systems, and their highly complex UIs (pull-down menus like you've last seen in Photoshop etc.) are entirely geared towards desktop and laptop computers.
In brief: OpenSim is not VR, and it's unlikely to ever truly become VR.
Okay, I still have the option to ask one of the four Trunk admins to add an extra "Virtual Worlds" list, arguing that OpenSim, just like Second Life, is not VR and thus doesn't fit onto a VR & AR list. But they might argue that it's close enough to VR & AR for a separate list not being justified.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #OpenSim #OpenSimulator #Metaverse #VirtualWorlds #VR #VirtualReality #AR #AugmentedReality #Trunk -
@Robert Kingett, blind I don't trust anything generated. At least not with super-obscure niche content like what I post.
And audio descriptions in general are why I'll never publish videos in the Fediverse.
I'd have to go into similar detail as for my pictures, only for moving pictures plus sound plus voice-over now. My descriptions would have to be so detailed that the video would have to pause to let the audio description catch up with the visuals. In fact, the video would spend more time paused while the audio description is rambling than actually moving, and it would never spend more than a few seconds moving at a time.
For one, I would have to describe and explain what the video shows at the very same level of detailed as I describe my images. And at least once I've described one single image at such a level of detail that it'd probably take a screen reader one full hour to read the image description aloud.
Besides, I would have take into account that it's a video. Everything would need timestamps. And instead of only describing the camera position and the camera angle, I would have to describe the camera movements like so:
Seven minutes, eighteen point one three seconds. The camera quickly rotates to the left around a vertical axis through a point roughly two point four metres straight ahead of the avatar. It starts rotating from the direction in which the avatar is facing, roughly twelve degrees to the east of north. The barn which has first appeared at five minutes, fifty-two point two eight seconds comes into view again, including all decoration around it. The camera only rotates around this vertical axis and not around any horizontal axis. The avatar does not rotate with the camera.
Seven minutes, eighteen point six four seconds: The video pauses to let this description catch up.
Seven minutes, eighteen point seven one seconds: The video no longer pauses. The camera reaches a rotation angle of roughly twenty degrees to the south of west. The rotation speed of the camera slows down. It continues to rotate to the left.
Seven minutes, eighteen point nine three seconds: The video pauses to let this description catch up.
Seven minutes, nineteen point zero four seconds: The video no longer pauses. The camera stops rotating at an angle of roughly twenty-five degrees to the west of south.
That is, in order to cater to deaf-blind users, I would have to have two time codes. One, the time code of the original video, not taking the pauses into account. Two, the time code of the described video with catch-up pauses.
And the video with catch-up pauses would be dramatically longer than the original video. Ten minutes of video would take me weeks to describe, probably over a month. And it would end up many hours long, depending on how much there is to describe and explain.
So a time code in the Braille description for deaf-blind users might actually read, "Six minutes, thirty-seven point five five seconds in the original video, fourteen hours, three minutes, forty-nine point two one seconds in this described version of the video."
By the way, no, an AI can't do that.
#Long #LongPost #CWLong #CWLongPost #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions -
CW: Accessibility standards keep me from posting in-world videos; CW: long (over 2,600 characters), Fediverse meta, accessibility meta, video description meta
This sounds like good advice...
Fedi.Tips wrote the following post Mon, 15 Apr 2024 17:28:51 +0100 If you're posting a video clip or an audio clip attached to a post, remember to include a text description which describes the sound. This is important so that the video or audio is accessible to deaf people.
Also, if it's a video, it's important to describe both the sound and the visuals so that it's accessible to everyone.
Text descriptions for audio and video are added just like text descriptions for images (exact steps vary depending on which app you use).
#FediTips #Accessibility
...but in my case, this would go out of hand. So much that I've completely discarded the idea of posting in-world videos.
I'm someone who has taken most of a day to describe three still images in a post in a combined almost 77,000 characters which take over an hour to read. No, you haven't misread any of this. And yes, this effort is necessary in my case.
Of course, if I were to describe a video, I'd have to go as much into details. However, there'd be a whole lot more to describe.
The video would constantly change. It would show much much more than a still image. There'd be audio that'd require detailed description instead of just name-dropping. All of it. Yes, including panning position. Movements of my avatar would have to be described. Movements of the camera around my avatar as well as independently from my avatar would have to be described. All movements would of course require distances, angles, speeds and changes of speed
The description would require a time code: Everything that happens would have to be mentioned including when exactly it happens, and since things might happen quickly or in quick succession, I'm talking about at least tenths of seconds.
Ten minutes of in-world video would take me weeks to describe, and the description would be the length of a novel and take a whole day to read.
Mastodon users would never see the post with the video because, as far as I know, Mastodon automatically rejects all external posts that exceed 100,000 characters, and I'm talking about millions of characters here. I don't even know if Hubzilla would let me post that much, and Hubzilla doesn't have any character limits except for what the Web server can handle.
Nobody would ever read this, so the whole effort would be in vain. But anything less than this would be critically lacking.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #MediaDescription #MediaDescriptions #VideoDescription #VideoDescriptions #A11y #Accessibility -
@Andre Louis What kind of description do you have in mind?
Only a brief mention what the audio is?
Or a full verbatim transcript of all words in the audio plus full descriptions of all the other sounds in the audio, time-coded in seconds and milliseconds and, if it's music, additionally in bars, quarters and even shorter notes?
#MediaDescription #MediaDescriptions -
@Octavia con Amore :pink_moon_and_stars: Such transcripts don't go into alt-text. I mean, into the alt-text of what should they even go?
They go into the post itself, of course. At least I hope that there's enough room for that on Funkwhale and/or Castopod. And nobody would host a podcast on Mastodon which, except for Threads, is the only Fediverse project with a 500-character limit.
As for my image descriptions, as I've said, these monsters go into the post text body. Right where you have a 500-character limit, and I don't have any limit whatsoever at all. The three links in my previous comment should demonstrate how I do it.
It isn't very accessible to have blind or visually-impaired users open a separate webpage just to have an image described, especially if they're on a phone, and they have the post with the image in one app (their Mastodon app) and the description in another (their browser). It's always best to have the image and the description in the same place.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #AltText #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #Transcript #Transcripts #MediaDescription #MediaDescriptions #Accessibility #A11y -
@Octavia con Amore :pink_moon_and_stars: I'm serious in everything.
When I said I write extremely long image description, I was absolutely serious.
And just like visuals need to be described for non-sighted people, audio needs to be described for people without or with impaired hearing. At least, pure spoken word audio needs verbatim transcripts. But I'm pretty sure there are people in the Fediverse who demand written descriptions for all audio in the Fediverse.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #AltText #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #Transcript #Transcripts #MediaDescription #MediaDescriptions #Accessibility #A11y -
@Octavia con Amore :pink_moon_and_stars: Asking as someone who already writes image descriptions that exceed anything known to Mastodon by titanic magnitudes:
What would be sufficient alt-text for music? A detailed description of every bit of sound in that piece of music, interspersed with lyrics and a description of the sung melody, time-coded in both milliseconds and bars?
#AltText #MediaDescription #MediaDescriptions #Accessibility #A11y #Music -
@fastfinge @Xantastic "Image description" if it's an image, "audio transcript" or maybe "audio description" if it's audio, "video transcript" or maybe "video description" if it's a video, "media description" as a more general term.
Mastodon only refers to it as "alt-text" because that's where Mastodon users always put it. After all, alt-text gives them 1,500 characters per image, but the toot only gives them 500 characters minus content warnings minus hashtags minus mentions etc.
I use more appropriate terms and more appropriate places to put my descriptions because I don't have to worry about character limits here on Hubzilla.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #ImageDescription #ImageDescriptions #AltText #ImageDescriptionMeta #CWImageDescriptionMeta #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions -
@fastfinge @modulux The question that nobody can agree on an answer to is: How detailed is the minimum requirement for media descriptions? How detailed is optimal? How detailed is too much, and is there such a thing as "too much"?
I'm someone whose "optimal" for image descriptions is probably beyond "too much" for many readers and definitely "too much" for almost all writers, and it keeps getting worse. I can post a 37,000-character description for one image that took me over 13 hours to research and write and find it lacking in multiple ways afterwards. In fact, I've done so.
Now I'm wondering what'd be an optimal audio description for music. Since I'm also a hobbyist musician, I might try to approach describing music in a way that goes into similar detail as sheet music, only that it includes sounds as well. Something that involves describing each audio part separately, although I'm not sure whether I should use the time within the whole audio file, the time only for the song, the bar-based timecode for the song or two or all three of them.
So I guess reading my audio description for one song is likely to take longer than listening to the whole album, if not multiple albums. But it'd be detailed and hopefully informative. That is, if I find a way to describe individual sounds including what effects do to them that's satisfying both for people who have turned deaf and people who were born deaf, both being complete laypeople when it comes to music.
But whether that's the right way, still not sufficient or way overkill, I don't know.
#Long #LongPost #CWLong #CWLongPost #FediMeta #FediverseMeta #CWFediMeta #CWFediverseMeta #ImageDescription #ImageDescriptions #AltText #ImageDescriptionMeta #CWImageDescriptionMeta #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions -
CW: Thinking about making videos, dropping it because of accessibility requirements; CW: long (2,940 characters)
A couple of times in the past, I've considered making virtual-world videos once I have a sufficiently powerful graphics card and a screen with a solution that won't make me the laughing stock of the Fediverse. I think I have the graphics card now.
I was going to publish them on PeerTube which is part of the Fediverse in case you don't know yet.
I no longer am.
Fulfilling the accessibility requirements in the Fediverse in a sufficiently informative way would not only be an out-right titanic effort. It would make my videos borderline unwatchable.
I'm someone who posts a picture of a shelf with a few dozen boxes on it and describes it with over 40,000 words. I have to because people wouldn't even get what's in the picture in the first place if I didn't. Imagine what I'd do in a video that shows much more than that. Much much more.
Very early in the video, it'd freeze so I could explain where it would have been made which would take a few minutes. Once everything in-world is visible for the first time, the video would freeze for half an hour or more while I describe and explain absolutely everything within the video frame. Whenever I move or turn, or the camera moves or turns, and something new comes into view, the video would freeze again for several minutes of description and explanation. And so forth.
In fact, the video freezes would end up quite long because I would have to speak slowly, clearly and in Simple English. That wouldn't make my descriptions any shorter, though.
A five-minute clip would be inflated to six hours or more.
The effort to get there would be gargantuan. Even short videos would take me weeks to write the audio description. Then they'd take me some more weeks to re-phrase everything in Simple English. The recording would take several days itself. Of course, I would have to transcribe what I have said in the original video to make subtitles. Then I would have to weave the transcriptions into the audio description script to create special subtitles for deaf-blind users. You never know what they might be interested in, no matter how niche your videos are.
And then it'd all be in vain. Nobody would watch the videos, either because they'd be way too long or because nobody is interested in the topic or both. Thus, there wouldn't be any comments on them.
So I wouldn't know if I had done everything right by being 100% compliant with WCAG 2.2 and the Fediverse accessibility requirements. Maybe I've missed something and not been thorough enough, but I wouldn't know. Or maybe I've completely overdone it and rendered the video unwatchable by being utterly overcompliant with accessibility requirements, and a tiny fraction of what I've done would have been sufficient. But since nobody would ever comment, I wouldn't know.
#Fediverse #Accessibility #A11y #Inclusion #MediaDescription #MediaDescriptions #AudioDescription #AudioDescriptions #Long #LongPost #CWLong #CWLongPost