#statistical-analysis — Public Fediverse posts on home.social

Hacker News @[email protected] · 2026-04-26 · 05:42 UTC

Per-image PCA characterization of the Kodak image suite (PDF and JSON)

https://github.com/PearsonZero/kodak-pcd0992-statistical-characterization/tree/main/baseline

#HackerNews #PCA #KodakImageSuite #StatisticalAnalysis #ImageProcessing #DataScience

#hackernews #pca #kodakimagesuite #statisticalanalysis #imageprocessing #datascience

Bryan King @[email protected] · 2026-03-31 · 11:30 UTC

The Silent Breach: Why Your Security Gateway Can’t See the Malware in Your Images

3,217 words, 17 minutes read time.

The Invisible Threat: Why Modern Cybersecurity Cannot Afford to Ignore Digital Steganography

In the current era of high-frequency cyber warfare, the most effective weapon is not necessarily the one with the highest encryption standard, but the one that remains entirely undetected until the moment of execution. While the industry spends billions of dollars perfecting cryptographic defenses to ensure that intercepted data cannot be read, a more insidious technique is resurfacing in the arsenals of advanced persistent threats: steganography. Unlike encryption, which transforms a message into an unreadable cipher—essentially waving a red flag that says “this is a secret”—steganography focuses on concealing the very existence of the communication. By embedding malicious payloads, configuration files, or stolen credentials within seemingly mundane carriers like a digital photograph of a corporate headquarters or a standard text readme file, attackers are successfully bypassing traditional security perimeters. Analyzing recent threat actor behaviors reveals that this is no longer a niche academic curiosity but a foundational component of modern malware delivery and data exfiltration strategies.

The primary danger of digital steganography lies in its exploitation of trust and the inherent limitations of automated scanning tools. Most Security Operations Centers (SOCs) are tuned to identify known malicious file signatures, suspicious executable behavior, or anomalies in encrypted traffic. However, a JPEG or PNG file is generally viewed as benign, often passing through email gateways and firewalls with minimal scrutiny beyond a basic virus scan. When a hacker hides data inside these files, they are leveraging the “noise” of the digital world to mask their signal. This methodology allows for a level of persistence that is difficult to combat, as the malicious content does not reside in a separate file that can be easily quarantined, but is woven into the fabric of legitimate business assets. As we move further into a landscape defined by zero-trust architectures, understanding the technical mechanics of how these hidden channels operate is a prerequisite for any robust defense strategy.

The Mechanics of Deception: How Least Significant Bit (LSB) Encoding Exploits Image Data

To understand how a hacker compromises a digital image, one must first understand the underlying structure of digital color representation. Most common image formats, such as $24$-bit BMP or PNG, represent pixels using three color channels: Red, Green, and Blue (RGB). Each of these channels is typically allocated $8$ bits, allowing for a value range from $0$ to $255$. When an attacker utilizes Least Significant Bit (LSB) encoding, they are targeting the rightmost bit in that $8$-bit sequence. Because this bit represents the smallest incremental value in the color intensity, changing it from a $0$ to a $1$ (or vice versa) results in a color shift so infinitesimal that it is mathematically and visually indistinguishable to the human eye. For instance, a pixel with a Red value of $255$ ($11111111$ in binary) that is changed to $254$ ($11111110$) remains, for all practical purposes, the same shade of red to any casual observer or standard display monitor.

By systematically replacing these least significant bits across thousands of pixels, an attacker can embed an entire secondary file—such as a PowerShell script or a Cobalt Strike beacon—within the “carrier” image. The process begins by converting the malicious payload into a binary stream and then iterating through the pixel array of the target image, swapping the LSB of each color channel with a bit from the payload. A standard $1080\text{p}$ image contains over two million pixels, which provides ample “real estate” to hide significant amounts of data without causing the type of visual artifacts or “noise” that would trigger a manual review. Furthermore, because the overall file structure and headers of the image remain intact, the file continues to function perfectly as an image, successfully deceiving both the end-user and many signature-based detection systems that only verify if a file matches its declared extension.

The technical sophistication of LSB encoding can be further heightened through the use of pseudo-random number generators (PRNGs). Instead of embedding the data in a linear fashion from the first pixel to the last—which creates a detectable statistical pattern—the attacker can use a secret key to seed a PRNG that determines a non-linear path through the pixel map. This effectively scatters the hidden bits throughout the image in a way that appears as natural “entropy” or sensor noise to basic statistical analysis tools. Consequently, without the specific algorithm and the corresponding key used to embed the data, extracting the payload becomes a significant cryptographic challenge. This layer of complexity ensures that even if a file is suspected of harboring a payload, proving its existence and retrieving the contents requires specialized steganalysis techniques that are often outside the scope of standard incident response.

Beyond Pixels: Hiding Payloads in Image Metadata and Headers

While LSB encoding focuses on the visual data of an image, a more straightforward and increasingly common method involves the exploitation of non-visual data segments, specifically headers and metadata fields. Every modern image file contains a variety of metadata, such as Exchangeable Image File Format (EXIF) data, which stores information about the camera settings, GPS coordinates, and timestamps. Attackers have recognized that these fields, intended for descriptive text, are essentially unregulated storage bins that can hold malicious strings. By injecting base64-encoded commands or encrypted URLs into the “Artist,” “Software,” or “Copyright” tags of an image, a threat actor can provide instructions to a piece of malware already residing on a victim’s machine. The malware simply “phones home” by downloading a benign-looking image from a public site like Imgur or GitHub and then parses the EXIF data to find its next set of instructions.

This technique is particularly effective for maintaining Command and Control (C2) infrastructure because it mimics legitimate web traffic. A firewall is unlikely to block an internal workstation from reaching a common image-hosting domain, and the payload itself is never “executed” in the traditional sense; it is merely read as a string by a separate process. Beyond standard metadata, hackers also target the internal structure of the file format itself, such as the “Comment” segments in JPEGs or the “chunks” in a PNG file. PNG files are organized into discrete blocks of data—such as IHDR for header information and IDAT for the actual image data—but the specification also allows for “ancillary chunks” (like tEXt or zTXt) which are ignored by most image viewers. An attacker can create custom, non-critical chunks that contain large volumes of data, effectively turning a simple icon into a delivery vehicle for a multi-stage malware dropper.

One of the most dangerous manifestations of this header manipulation is the creation of “polyglot” files. A polyglot is a file that is valid under two different file formats simultaneously. For example, a skilled attacker can craft a file that begins with the “Magic Bytes” of a GIF file (e.g., 47 49 46 38), ensuring that any image viewer or web browser treats it as a graphic, but also contains a valid Java Archive (JAR) or a web-based script further down in its structure. When this file is handled by a browser, it displays as an image, but if it is passed to a script interpreter or a specific application vulnerability, it executes as code. This dual-identity approach creates a massive blind spot for security products that rely on file-type identification to apply security policies. By blending the executable logic with the static data of an image, hackers have successfully created “stealth” files that are nearly impossible to categorize correctly without deep, byte-level inspection of the entire file body.

Text-Based Subversion: Linguistic Steganography and Zero-Width Characters

While the manipulation of high-entropy image files provides a vast playground for hiding data, hackers often prefer the simplicity and ubiquity of text files to evade modern detection engines. Text-based steganography is particularly dangerous because it exploits the very foundation of digital communication: the way we render characters on a screen. One of the most sophisticated methods involves the use of Unicode zero-width characters. These are non-printing characters, such as the Zero-Width Joiner (U+200D) or the Zero-Width Space (U+200B), which are designed to handle complex ligatures or invisible word breaks. Because these characters have no visual width, they are completely invisible to a human reading a text file or an administrator viewing a configuration script. However, to a computer, they are distinct pieces of data. An attacker can map these invisible characters to binary values—for instance, using a Zero-Width Joiner to represent a ‘1’ and a Zero-Width Non-Joiner to represent a ‘0’—allowing them to embed an entire encoded script inside a perfectly normal-looking README.txt file or even a social media post.

Beyond the use of “invisible” characters, hackers frequently leverage whitespace steganography, a technique that hides information in the trailing spaces and tabs of a document. In environments where source code is frequently moved between developers, a file containing extra spaces at the end of lines is rarely viewed with suspicion; it is usually dismissed as poor formatting or a byproduct of different text editors. Tools like “Snow” have long been used to conceal messages in this manner, effectively turning the “empty” space of a document into a covert storage medium. This is particularly effective in bypassing Data Loss Prevention (DLP) systems that are programmed to look for specific keywords or patterns of sensitive data like credit card numbers. By breaking a sensitive string into binary and hiding it as a series of tabs and spaces within a large corporate policy document, the data can be exfiltrated without triggering any signature-based alarms, as the document’s visible content remains entirely benign and policy-compliant.

Linguistic steganography represents the peak of this deceptive art, shifting the focus from bit-level manipulation to the nuances of human language itself. Rather than relying on technical “glitches” or hidden characters, this method involves altering the structure of sentences to carry a hidden message. By using a pre-defined dictionary and specific grammatical variations, an attacker can construct sentences that appear natural but encode specific data points based on word choice or sentence length. For example, a seemingly innocent email about a lunch meeting could, through a specific arrangement of adjectives and nouns, encode the IP address of a new Command and Control server. This form of “mimicry” is incredibly difficult for automated systems to detect because it does not involve any unusual file properties or illegal characters. It relies on the semantic flexibility of language, making it one of the most resilient forms of covert communication available to sophisticated threat actors who need to maintain long-term, low-profile access to a target network.

Real-World Weaponization: Case Studies in Malware and Data Exfiltration

The transition of steganography from a theoretical concept to a primary weapon in the wild is best illustrated by the evolution of exploit kits and state-sponsored campaigns. One of the most notorious examples is the Stegano exploit kit, which gained notoriety for hiding its malicious logic within the alpha channel of PNG images used in banner advertisements. The alpha channel, which controls the transparency of pixels, provides a perfect hiding spot because small variations in transparency are virtually impossible for a human to see against a standard web background. By embedding encrypted code in these advertisements, the attackers were able to redirect users to malicious landing pages without the users ever clicking a link or the ad-networks ever detecting the payload. This “malvertising” campaign demonstrated that steganography could be scaled to target millions of users simultaneously, turning the visual infrastructure of the internet into a delivery system for ransomware and banking trojans.

Advanced Persistent Threat (APT) groups, such as the North Korean-linked Lazarus Group, have refined these techniques to maintain persistence within highly secured environments. In several documented campaigns, Lazarus utilized BMP (bitmap) files to deliver second-stage malware. These images, often disguised as legitimate documents or icons, contained encrypted DLL files hidden within their pixel data. Once the initial dropper was executed on a victim’s machine, it would download the BMP file, extract the hidden bytes from the image data, and load the malicious DLL directly into memory. This “fileless” approach is a nightmare for traditional antivirus solutions because the malicious code never exists as a standalone file on the disk; it is only reconstructed at runtime from the components hidden within the benign image. This method effectively neutralizes most perimeter defenses that rely on file-scanning, as the image file itself is technically valid and non-executable.

The use of steganography is not limited to the delivery of malware; it is equally effective for the silent exfiltration of sensitive data. During a major breach of a global financial institution, investigators discovered that insiders were using high-resolution digital photographs to smuggle proprietary trading algorithms out of the network. By using LSB encoding to hide the source code within the photos of “office pets” and “company outings,” the attackers were able to bypass DLP systems that were specifically tuned to block the transmission of code-like text or large archives. Because the files remained valid JPEGs, they were permitted to be uploaded to personal cloud storage and social media accounts. This highlights a critical flaw in many modern security architectures: the assumption that if a file looks like an image and acts like an image, it is nothing more than an image. These real-world cases prove that steganography is the ultimate tool for bypassing the “secure” perimeters that organizations rely on.

Detection and Defiance: The Technical Challenges of Steganalysis

Detecting the presence of hidden data within a carrier file, a field known as steganalysis, is a game of statistical probability rather than binary certainty. Unlike traditional virus detection, which relies on matching a file’s hash or signature against a database of known threats, steganalysis must look for anomalies in the file’s expected data distribution. One of the most common technical approaches is the use of Chi-squared ($\chi^2$) tests, which analyze the distribution of pixel values in an image. In a natural, unmodified image, the frequency of adjacent color values tends to follow a predictable pattern. However, when an attacker injects a binary payload into the Least Significant Bits, they introduce a level of artificial entropy that flattens this distribution. This statistical “signature” of randomness is often the only clue that an image has been tampered with. Specialized tools can scan directories of images, flagging those with an unusually high degree of LSB entropy for further investigation by forensic analysts.

Despite the power of statistical analysis, defenders face a significant hurdle known as the “Clean Image” problem. Steganalysis is exponentially more accurate when the analyst has access to the original, unmodified version of the file for comparison. Without this baseline, it is remarkably difficult to prove that a slight color variation or a specific metadata string is a malicious injection rather than a byproduct of the camera’s sensor noise or a specific compression algorithm. Furthermore, as attackers shift toward more sophisticated embedding methods—such as spread-spectrum steganography, which distributes the payload across many different frequencies within the image data—traditional statistical tests often fail. These techniques mimic the natural noise of the medium so closely that the signal-to-noise ratio becomes nearly impossible to decipher without the original key. This mathematical reality means that for many organizations, detection is not a scalable solution; instead, the focus must shift toward proactive neutralization.

Proactive defense, or “active warden” strategies, involve the automated sanitization of all incoming media files to ensure that any potential hidden channels are destroyed. Rather than trying to detect if a file is “guilty,” security gateways can be configured to “clean” every file by default. For images, this might involve re-compressing a JPEG, which slightly alters pixel values and effectively wipes out LSB-embedded data. For text files, a “sanitizer” can strip out all non-printing Unicode characters and normalize whitespace, effectively neutralizing zero-width character attacks. In high-security environments, some organizations go as far as “image flattening,” where an image is rendered into a canvas and then re-captured as a completely new file, ensuring that only the visual information survives and any hidden binary logic in the headers or metadata is discarded. This “zero-trust” approach to media handling is the only way to reliably defeat an adversary that specializes in hiding in plain sight.

Conclusion: The Future of Covert Channels in an AI-Driven World

The arms race between steganographers and security researchers is entering a new, more volatile phase driven by the rise of generative artificial intelligence. We are moving beyond the era of simply “hiding” data in existing files toward the era of “generative steganography,” where AI models can create entirely new, high-fidelity images or text blocks specifically designed to house a hidden payload from their very inception. These AI-generated carriers can be engineered to be statistically perfect, matching the expected entropy of a natural file so precisely that traditional steganalysis tools are rendered obsolete. As attackers begin to use Large Language Models (LLMs) to generate “innocent” emails that encode complex command-and-control instructions within the very flow of the prose, the challenge for defenders will shift from technical detection to semantic analysis. The “invisible” threat is becoming smarter, more adaptive, and more integrated into the standard tools of digital communication.

Ultimately, the resurgence of steganography serves as a critical reminder that cybersecurity is as much about psychology and subversion as it is about bits and bytes. By focusing exclusively on the “gates” of our networks—the firewalls, the encryptions, and the passwords—we have left the “windows” of our daily digital interactions wide open. A JPEG is rarely just a JPEG, and a text file is rarely just text. As long as there is a medium for communication, there will be a way to subvert it for covert purposes. For the modern security professional, the lesson is clear: true security requires a healthy skepticism of even the most benign-looking assets. Implementing deep-file inspection, automated media sanitization, and a rigorous zero-trust policy for all file types is no longer an optional luxury; it is a fundamental necessity in a world where the most dangerous threats are the ones you can’t see.

Call to Action

If this breakdown helped you think a little clearer about the threats out there, don’t just click away. Subscribe for more no-nonsense security insights, drop a comment with your thoughts or questions, or reach out if there’s a topic you want me to tackle next. Stay sharp out there.

D. Bryan King

Sources

NIST SP 800-101 Rev. 1: Guidelines on Mobile Device Forensics (Steganography Overview)
MITRE ATT&CK: Steganography (T1027.003)
CISA Analysis Report (AR21-013A): Malicious Steganography in SolarWinds Aftermath
Verizon 2024 Data Breach Investigations Report (DBIR)
Kaspersky: Steganography in Contemporary Cyberattacks
Mandiant: Sophisticated Steganography in Targeted Attacks
SentinelOne: Digital Steganography and Malware Persistence
Krebs on Security: Malware Hides in Plain Sight via Steganography
Palo Alto Unit 42: Steganography in the Wild
McAfee Labs: The Art of Hiding Data Within Data
SANS Institute: Steganography – Hiding Data Within Data
Dark Reading: Why Steganography is the Next Frontier
Center for Internet Security (CIS): The Basics of Steganography
IEEE Xplore: A Review on Image Steganography Techniques

Disclaimer:

The views and opinions expressed in this post are solely those of the author. The information provided is based on personal research, experience, and understanding of the subject matter at the time of writing. Readers should consult relevant experts or authorities for specific guidance related to their unique situations.

Related Posts

Rate this:

#APTTechniques #binaryEncoding #C2Channels #chiSquaredTest #CISAReports #commandAndControl #covertCommunication #cyberDefense #cyberThreats #cyberWarfare #cybersecurity #dataExfiltration #dataLossPrevention #digitalForensics #digitalWatermarking #DLPBypass #encryptionVsSteganography #entropyAnalysis #EXIFData #exploitKits #fileSanitization #filelessMalware #forensicAnalysis #GIFAR #hiddenPayloads #hiddenScripts #imageSteganography #informationHiding #LazarusGroup #leastSignificantBit #linguisticSteganography #LSBEncoding #maliciousImages #malwareDetection #malwarePersistence #memoryInjection #metadataExploitation #MITREATTCK #networkSecurity #NISTSP800101 #obfuscation #payloadDelivery #pixelManipulation #polyglotFiles #RGBPixelData #securityResearch #SOCAnalyst #statisticalAnalysis #steganalysis #SteganoExploitKit #steganography #technicalDeepDive #textSteganography #threatHunting #UnicodeExploits #whitespaceSteganography #zeroTrust #zeroWidthCharacters

#apttechniques #binaryencoding #c2channels #chisquaredtest #cisareports #commandandcontrol

N-gated Hacker News @[email protected] · 2026-03-06 · 20:21 UTC

🚀 Welcome to the riveting world of Apache Otava, where you can enjoy the thrill of #CSV files and #PostgreSQL databases like they're action movies. 🎬 Dive into the exhilarating quest for changepoints—because who doesn't love a good statistical analysis party? 🎉 And remember, folks, it's #incubating, which is a fancy way of saying it's still figuring out how to adult. 🙃
https://otava.apache.org/ #ApacheOtava #StatisticalAnalysis #DataScience #HackerNews #ngated

#csv #postgresql #incubating #apacheotava #statisticalanalysis #datascience

Statistics Globe @[email protected] · 2026-01-09 · 12:01 UTC

Comparison of mice imputation with Nonlinear Nonparametric Statistics (NNS) and k-Nearest Neighbor (kNN).

Check out my course for more details: https://statisticsglobe.com/online-course-missing-data-imputation-r

#rstats #statistics #datascience #statisticalanalysis

💧🌏 Greg Cocks @[email protected] · 2025-09-19 · 17:50 UTC

Variations In Road Exposure And Traffic Volumes In The United States In Areas Susceptible To Landslides [statistical breakdown & analysis, including spatial]
--
https://doi.org/10.1016/j.ijdrr.2025.105567 <-- shared paper
--
#landslide #road #highway #transportation #hazard #exposure #massmovement #engineeringgeology #geology #threat #model #modeling #statistics #national #roadsystem #spatialanalysis #lengths #percentages #engineering #mitigation #statisticalanalysis #safety #susceptibility #publicsafety #disruption #humanimpacts #regionalassessment

#landslide #road #highway #transportation #hazard #exposure

Hacker News @[email protected] · 2025-08-21 · 01:33 UTC

Is Rotten Tomatoes Still Reliable? A Statistical Analysis

https://www.statsignificant.com/p/is-rotten-tomatoes-still-reliable

#HackerNews #RottenTomatoes #Reliability #StatisticalAnalysis #MovieRatings #FilmCritique #DataAnalysis

#hackernews #rottentomatoes #reliability #statisticalanalysis #movieratings #filmcritique

Kathy Bryson @[email protected] · 2025-08-04 · 10:48 UTC

What's behind the headlines - #datatracking #datacollection #research #statisticalanalysis #reporting

Trump says #BureauLaborStatistics ‘scam.’ Here’s how the jobs report really works | CNN Business https://www.cnn.com/2025/08/04/business/bureau-of-labor-statistics-jobs-report-explainer-hnk

#datatracking #datacollection #research #statisticalanalysis #reporting #bureaulaborstatistics

Aneesh Sathe @[email protected] · 2025-07-11 · 05:41 UTC

Beyond the Dataset

On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is in answering the question: How was this place put together?

A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

Two load-bearing pillars

While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

How we measure — the trip from reality to raw numbers. Feature extraction.
How we compare — the rules that let those numbers answer a question. Statistics and causality.

Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

How we measure

A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

How we compare

Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.

The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished.

Why the pillars get skipped

Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

Opening day

Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

#AI #causalInference #cleanData #dataCentricAI #dataProvenance #dataQuality #dataScience #evidenceBasedDecisionMaking #experimentDesign #featureExtraction #foundationEngineering #instrumentation #measurementError #science #startupAnalytics #statisticalAnalysis #statistics

#ai #causalinference #cleandata #datacentricai #dataprovenance #dataquality

Statistics Globe @[email protected] · 2025-07-01 · 07:17 UTC

Using dplyr and ggplot2 in R can significantly streamline your data analysis process, making it easier to work with complex data sets.

I have created a video tutorial in collaboration with Albert Rapp, where I demonstrate how to do this in practice: https://www.youtube.com/watch?v=EKISB0gnue4

#coding #datavisualization #rprogramming #dataviz #statisticalanalysis #package #datastructure #ggplot2 #bigdata #tidyverse

#coding #datavisualization #rprogramming #dataviz #statisticalanalysis #package

N-gated Hacker News @[email protected] · 2025-06-16 · 02:03 UTC

Oh joy, another statistical computing environment! 🙄 #LispStat promises to be the R you didn't ask for, but in Lisp, because why not add some parentheses to your data woes? 🤔 Perfect for those who enjoy statistical analysis with a side of vintage programming language nostalgia. 📉👴
https://lisp-stat.dev/about/ #Rlanguage #StatisticalAnalysis #ProgrammingNostalgia #DataScience #HackerNews #ngated

#lispstat #rlanguage #statisticalanalysis #programmingnostalgia #datascience #hackernews

Hacker News @[email protected] · 2025-06-16 · 02:03 UTC

Lisp-stat: Lisp environment for statistical computing

https://lisp-stat.dev/about/

#HackerNews #LispStat #Lisp #Computing #StatisticalAnalysis #DataScience #ProgrammingLanguages

#hackernews #lispstat #lisp #computing #statisticalanalysis #datascience

Statistics Globe @[email protected] · 2025-06-05 · 16:44 UTC

Final reminder that registration for all Statistics Globe online courses closes today and won’t reopen until the end of July.

You can find all courses here: https://statisticsglobe.com/courses

#statistics #datascience #rasts #dataviz #statisticalanalysis

Statistics Globe @[email protected] · 2025-05-09 · 10:14 UTC

When handling missing values, selecting an imputation method that balances simplicity, variability, and accuracy is essential. Deterministic Regression, Stochastic Regression, and Predictive Mean Matching (PMM) are three widely used methods, each with strengths and limitations depending on the data's structure.

The attached plot compares these methods using a non-linear data example.

Tutorial: https://statisticsglobe.com/predictive-mean-matching-imputation-method/.

More: http://eepurl.com/gH6myT

#statisticalanalysis #datascience #database

Statistics Globe @[email protected] · 2025-05-06 · 07:18 UTC

Creating publication-ready plots in R is easier than ever with ggpubr. This extension for ggplot2 simplifies the process of generating clean and professional graphics, especially for exploratory data analysis and reporting.

The attached visual, which I created using ggpubr, demonstrates its versatility.

Additional information: https://statisticsglobe.com/online-course-data-visualization-ggplot2-r

#bigdata #visualanalytics #tidyverse #programming #statisticalanalysis #datavisualization #package #data #ggplot2

#bigdata #visualanalytics #tidyverse #programming #statisticalanalysis #datavisualization

Statistics Globe @[email protected] · 2025-05-02 · 10:14 UTC

Making your data analysis more insightful and informative is effortless with ggstatsplot. This powerful ggplot2 extension in R combines statistical analysis and data visualization in a single workflow, helping you generate plots that include statistical summaries directly on the visualizations.

The attached visual, which I created using ggstatsplot, showcases its capabilities.

Learn more: https://statisticsglobe.com/online-course-data-visualization-ggplot2-r

#programming #statisticalanalysis #datavisualization #dataanalytics

Statistics Globe @[email protected] · 2025-04-22 · 07:58 UTC

Today is the final day to register for my courses before a 3-month break with no new enrollments, and your last chance to get a 33% discount.

Here are the courses you can join: https://statisticsglobe.com/courses

#rstats #statistics #datascience #dataviz #statisticalanalysis

Statistics Globe @[email protected] · 2025-04-22 · 07:17 UTC

If you're still using raw R outputs for presentations, it's time for an upgrade! Tools like gtsummary bring your statistical results to life, making them much more digestible for non-technical audiences.

The visualization included here was originally shared in a post by Dr. Alexander Krannich. Thanks to Alexander for inspiring me to create this post.

More details are available at this link: http://eepurl.com/gH6myT

#statisticalanalysis #rprogramming #bigdata #coding

Hacker News @[email protected] · 2025-04-21 · 18:35 UTC

Dumb statistical models, always making people look bad

https://statmodeling.stat.columbia.edu/2025/04/18/dumb-statistical-models-always-making-people-look-bad/#comments

#HackerNews #DumbStatisticalModels #AlwaysLookBad #StatisticalAnalysis #DataScience #Critique

#hackernews #dumbstatisticalmodels #alwayslookbad #statisticalanalysis #datascience #critique

Statistics Globe @[email protected] · 2025-04-16 · 12:40 UTC

To give you a sneak peek into my courses, I’ve just published a free video on YouTube that walks through a complete data project in R.

Watch the video here: https://www.youtube.com/watch?v=l2OgRdofp90

#rstats #statistics #datascience #dataviz #statisticalanalysis #pca

Statistics Globe @[email protected] · 2025-02-21 · 11:14 UTC

Working with text in ggplot2 plots can be a mess, especially when dealing with overlapping labels, busy backgrounds, or the need for custom formatting. Thankfully, several powerful ggplot2 extensions make text manipulation and annotation much easier and more effective.

With these tools, text in ggplot2 becomes much more manageable and visually appealing.

#ggplot2 #statisticalanalysis #package #visualanalytics #rstudio #tidyverse #datavisualization #datascience

#ggplot2 #statisticalanalysis #package #visualanalytics #rstudio #tidyverse

Statistics Globe @[email protected] · 2025-02-11 · 08:17 UTC

In missing data imputation, it is crucial to compare the distributions of imputed values against the observed data to better understand the structure of the imputed values.

The visualization below can be generated using the following R code:

library(mice)
my_imp <- mice(boys)
densityplot(my_imp)

Take a look here for more details: https://statisticsglobe.com/online-workshop-missing-data-imputation-r

#datastructure #statisticalanalysis #dataanalytics #visualanalytics #pythoncoding #package #datavisualization #datascience

#datastructure #statisticalanalysis #dataanalytics #visualanalytics #pythoncoding #package

Statistics Globe @[email protected] · 2025-02-04 · 08:17 UTC

Avoiding text overlap in plots is essential for clarity, and R offers a great solution with the ggplot2 and ggrepel packages. By automatically repositioning labels, ggrepel keeps your plot clean and easy to interpret.

Video: https://www.youtube.com/watch?v=5lu4h_CPhi0
Website: https://statisticsglobe.com/avoid-overlap-text-labels-ggplot2-plot-r

Take a look here for more details: https://statisticsglobe.com/online-course-data-visualization-ggplot2-r

#pythonprogramminglanguage #statisticalanalysis #datascience #datastructure #package #rstudio

IB Teguh TM @[email protected] · 2025-01-28 · 13:13 UTC

Dive into our latest blog on #StatisticalAnalysis using #Pandas! Learn to calculate essential statistics for better data insights. Perfect for analysts and data scientists. Explore now!

https://teguhteja.id/pandas-statistic-analysis-guide/

#statisticalanalysis #pandas

Marco Giannini :tux: @[email protected] · 2024-12-11 · 14:12 UTC

Il 2024 è stato l’anno di Linux… su Pornhub
#linux #StatisticalAnalysis #UnoLinux

Ogni anno il famoso sito per adulti Pornhub pubblica le statistiche annuali di utilizzo del sito. Fra le tante metriche prese in considerazione c’è quella relativa ai sistemi operativi più utilizzati dai suoi utenti. Il 2024, così come il 2023, ci da un fornito un dato statistico molto importante per noi pinguini.

https://www.marcosbox.com/2024/12/11/il-2024-e-stato-lanno-di-linux-su-pornhub/

#linux #statisticalanalysis #unolinux

Alex Verbeek @[email protected] · 2024-12-01 · 21:01 UTC

How Statistical Analysis is Reshaping Prairie Dog Science

➡️ https://theplanet.substack.com/p/how-statistical-analysis-is-reshaping

#prairiedogs #yesJMPcan @JMP_software #data #statisticalanalysis #science

#prairiedogs #yesjmpcan #data #statisticalanalysis #science

SleepyCatten (referred for ME/CFS, but it's probably P-HSD arc) @[email protected] · 2024-11-03 · 15:46 UTC

CW: Help with statistical analysis of E2 trough results for a friend, whose pmol/L per mg is steadily increasing

Hey folks

A friend of ours has been experiencing something odd that we've not seen before:

Her estradiol (E2) pmol/L per mg level has been gradually rising over the last 305 days.

... and we lack the understanding of statistical analysis to be able to analyse it properly 😞

All injections were done weekly, using estradiol enanthate compounded at 40 mg/mL from the same homebrewer.

We don't think markdown tables work in Glitch-SOC, so here's all the data we have in a non-tabular form.

Date: 2023-12-28
Volume: 0.13 mL
Dose: 5.2 mg
Trough: 724 pmpl/L
Pmol/L per mg: ~139
Date: 2024-03-26
Volume: 0.17 mL
Dose: 6.8 mg
Trough: 1272 pmpl/L
Pmol/L per mg: ~187
Date: 2024-10-28
Volume: 0.16 mL
Dose: 6.4 mg
Trough: 2491 pmpl/L
Pmol/L per mg: ~272

We're not sure how to use the data to model how to get her pmol/L per mg stable, rather than steadily increasing.

The desired aim is to keep her pmol/L trough roughly between 750 to 900 pmol/L, without it increasing steadily over time.

We think this might mean moving to a 8-10 day injection cycle and a lower dose (0.10 to 0.12 mL; 4 to 4.8 mg), but we are not sure how to do it.

Can anyone help us out here? :PleadingFace:

#DIYHRT #OpenHRT #injections #TransFem #StatisticalAnalysis #statistics #AskFedi #trans #transgender #estradiol #queer #LGBTQ+ #LGBTQIA+ #simulation #maths

#diyhrt #openhrt #injections #transfem #statisticalanalysis #statistics