home.social

Search

1000 results for “Data_Ranger”

  1. "In the history of state-sponsored hacking, the spectrum of cyber operations bent on sabotage have ranged from crude “wiper” attacks that destroy data on target computers to the legendary Stuxnet, a piece of malware the US and Israel first deployed in Iran in 2007 to silently accelerate the spinning of nuclear enrichment centrifuges until they destroyed themselves. Now researchers have discovered another chapter in that decades-long evolution of cybersabotage techniques: a 21-year-old specimen of malware capable of tampering with research and engineering software to undetectably sow mayhem—one that may have been used in Iran, even before Stuxnet.

    Vitaly Kamluk and Juan Andrés Guerrero-Saade, two researchers from the cybersecurity firm SentinelOne, on Thursday revealed a breakthrough in the mystery of a piece of malware known as Fast16, a piece of code whose purpose has eluded the cybersecurity world since its existence was first revealed in an NSA leak in 2017. The SentinelOne researchers have now reverse-engineered the Fast16 code, which they say dates back to 2005 and was likely created by either the US government or one of its allies.

    Kamluk and Guerrero-Saade have determined that the Fast16 malware was designed to carry out the most subtle form of sabotage ever seen in an in-the-wild malware tool: By automatically spreading across networks and then silently manipulating computation processes in certain software applications that perform high-precision mathematical calculations and simulate physical phenomena, Fast16 can alter the results of those programs to cause failures that range from faulty research results to catastrophic damage to real-world equipment."

    wired.com/story/fast16-malware

    #CyberSecurity #NSA #Fast16 #StateHacking #Iran #USA

  2. "In the history of state-sponsored hacking, the spectrum of cyber operations bent on sabotage have ranged from crude “wiper” attacks that destroy data on target computers to the legendary Stuxnet, a piece of malware the US and Israel first deployed in Iran in 2007 to silently accelerate the spinning of nuclear enrichment centrifuges until they destroyed themselves. Now researchers have discovered another chapter in that decades-long evolution of cybersabotage techniques: a 21-year-old specimen of malware capable of tampering with research and engineering software to undetectably sow mayhem—one that may have been used in Iran, even before Stuxnet.

    Vitaly Kamluk and Juan Andrés Guerrero-Saade, two researchers from the cybersecurity firm SentinelOne, on Thursday revealed a breakthrough in the mystery of a piece of malware known as Fast16, a piece of code whose purpose has eluded the cybersecurity world since its existence was first revealed in an NSA leak in 2017. The SentinelOne researchers have now reverse-engineered the Fast16 code, which they say dates back to 2005 and was likely created by either the US government or one of its allies.

    Kamluk and Guerrero-Saade have determined that the Fast16 malware was designed to carry out the most subtle form of sabotage ever seen in an in-the-wild malware tool: By automatically spreading across networks and then silently manipulating computation processes in certain software applications that perform high-precision mathematical calculations and simulate physical phenomena, Fast16 can alter the results of those programs to cause failures that range from faulty research results to catastrophic damage to real-world equipment."

    wired.com/story/fast16-malware

    #CyberSecurity #NSA #Fast16 #StateHacking #Iran #USA

  3. How to measure the European space economy

    Space technology, data and services have become indispensable. They contribute to a wide range of key activities, from…
    #Economy #EconomyofEU #EconomyoftheEU #EUeconomy #Europe #spaceindustry #spaceresearch
    europesays.com/2865910/

  4. Can Carbon Credits Clean Up Big Tech’s AI-Fueled Emissions Surge?

    As multiple large-scale data centres are developed around the globe, Big Tech is investing heavily in a range…
    #NewsBeep #News #Environment #AI #AU #Australia #BigTech #carboncredits #climatepolicy #DataCenters #emissions #Greenwashing #Microsoft #netzero #RENEWABLEENERGY #Science
    newsbeep.com/au/555982/

  5. #SolarUpdate time

    New year, new data... I post these every month, giving people a glimpse into the solar and battery system I've had on my home since early 2023.

    All figures are within a few % of actual, the data reported for the web portal is very close to true... but there's always a slight variance compared to the energy provider, as data drop outs can occur at times if wifi acts flaky for example.

    Total solar generation for Jan 2026 was roughly 163kwh and we used 99% of that, only exporting 1kwh.

    We imported roughly 286kwh of energy.

    That's not bad generation for a January. Previous years have been in the 150kwh range, so we're up a few % on the avg of 155kwh.

    Total cost of electric including all standing charges and VAT for the month was £92.

    Gas for heating was still the biggest part of the bill. My elderly mum lives with me and struggles with the cold a lot more. So the heating is 2ºC higher than I would have it... and that adds about £30 a month extra to the heating bill over the colder months. Typically the heating is used between Oct-April.

    So gas use for Jan inc all charges was £126.

    The total bill was £227 and some change.

    That's less than previous years... I've done a lot to help insulate the house these last couple of years, and it shows. Prior to that work, the winter bill for Jan 2023 was £267. After the work it's been £222 and £227. So that's an avg of £42 saved just in Jan these last 2yrs.

    The gas bill was almost identical though for last year and this year. In spite of turning the thermostat up. Which means, the heating is using more gas when on, but the house is retaining the heat for longer. So if i could turn the thermostat down to my regular 18-19ºC rather than the 20-21ºC it is at the moment. I'd see better savings.

    Obviously, the gas use isn't impacted by the solar/battery system. But I've worked hard to improve the efficiency of the whole house within my budget and abilities. The only way to improve it further would be to replace the cavity wall insulation... and that's expensive as they have to suck out the old stuff and replace with better quality... these days they use small resin bound polystyrene balls to replace the loose stuff they used before, as that drops over time leaving voids under windows and the tops of walls.

    Savings for the month on electric are in the £42 range thanks to the solar generation.

    #Solar
    #Battery
    #GoingGreen

  6. Local car retailer urges safer driving after new speeding data is revealed

    The data, released by Volvo Car UK which operates locally from FRF Volvo Swansea on Valley Way, shows that more than four drivers a day were recorded travelling at extreme speeds between September 2024 and August 2025.

    Over the same period, police forces logged more than three million speeding offences in total, though the true figure is likely higher.

    To help address excessive speeding, all models at FRF Volvo Swansea since 2020 have been fitted with Volvo’s 112mph electronic speed limiter. Forming part of the brand’s Vision Zero safety strategy, the limiter aims to ensure no one is seriously injured or killed in a new Volvo vehicle.

    The limiter has been strongly backed by the Swansea retailer, which says the new research highlights the ongoing need to prioritise road safety both locally and nationally.

    John Radcliffe, Retailer Principal, at FRF Volvo Swansea, said:

    “These findings really reinforce how important it is that we all take road safety seriously – not just nationally, but here in Swansea where the wellbeing of local drivers, pedestrians, and families is at stake.

    “Volvo’s 112mph speed limiter is one of many steps we’re taking to help reduce the risks on our roads, and it’s a feature we have strongly supported since it was first introduced five years ago.

    “Limiting top speeds isn’t about restricting drivers – it’s about protecting lives. Slower speeds give people more time to react, reduce the severity of collisions and ultimately make our community safer.

    “At FRF Volvo Swansea, safety is at the heart of everything we do. We want people in Swansea to feel confident that when they get behind the wheel of one of our cars, they’re not just benefiting from modern technology, but from technology designed to keep them and everyone around them safe.”

    For further information about Volvo’s latest range of models, visit frfvolvo.co.uk , or call the retailer on 01792 310999.

    #FRFVolvo #speedLimiter #speeding
  7. US Treasury yields edged higher, reversing previous gains as markets remained range-bound ahead of key US CPI data and after Fed Governor Waller signaled continued rate cuts may be needed in 2025.
    #YonhapInfomax #USTreasury #Yield #FederalReserve #CPI #ChristopherWaller #Economics #FinancialMarkets #Banking #Securities #Bonds #StockMarket
    en.infomaxai.com/news/articleV

  8. So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
    spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

  9. So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
    spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

  10. So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
    spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

  11. So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
    spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

  12. So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
    spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

  13. More on the crises engulfing the Office of National Statistics:

    To free up staff to work to rectify problems in a range of economic data, perhaps most obviously the now unreliable Labour Force Survey, the ONS is puling staff off work on health & crime statistics. It is also looking to pause it annual local survey efforts from which much data on inequality is derived.

    While we need good (better) economic data this should at the cost of vital crime & health statistics.

    #ONS #politics
    h/t FT

  14. @transfers This is curious. This is (was?) the range used by Kolkata IX bgp.tools/ixp/Kolkata+IX. Last modified on WHOIS data is 2025-11-06T00:35:58Z

    Maybe maybe, they're going under (as now Extreme IX, DE CIX and AMS IX entered the Kolkata market).

    #Kolkata #India #IX

  15. 3pm and on. Copy and compact. Long range transmissions a lot of data and an unknown amount of actual information. Pondering haystacks, forks and completely different means to dig through this kind of wilderness. Another mug of coffee, on the windowsill. And finally some rays of clear light piercing through the afternoon skies. Sun inside outside and still too fast.

    #outerworld #concrete city #ominous afternoons #autumn in moments #home office hours

  16. 3pm and on. Copy and compact. Long range transmissions a lot of data and an unknown amount of actual information. Pondering haystacks, forks and completely different means to dig through this kind of wilderness. Another mug of coffee, on the windowsill. And finally some rays of clear light piercing through the afternoon skies. Sun inside outside and still too fast.

    #outerworld #concrete city #ominous afternoons #autumn in moments #home office hours

  17. Future of Privacy Forum: A Price to Pay: U.S. Lawmaker Efforts to Regulate Algorithmic and Data-Driven Pricing. “‘Algorithmic pricing,’ ‘surveillance pricing,’ ‘dynamic pricing’: in states across the U.S., lawmakers are introducing legislation to regulate a range of practices that use large amounts of data and algorithms to routinely inform decisions about the prices and products offered to […]

    https://rbfirehose.com/2025/08/19/a-price-to-pay-u-s-lawmaker-efforts-to-regulate-algorithmic-and-data-driven-pricing-future-of-privacy-forum/

  18. Yesterday and today, the members' meeting of #DALIA,which promotes the #learning and #teaching of #data skills with a free range of #open #educational resources, took place in Mainz.

    Day 1 was dedicated to the work of the 20 working groups with many good presentations and interesting discussions....

    #chemistry #rdm #fdm #researchdata #researchdata #NFDI4Chem #fairdata #workshop #research #researchdatamanagement #study #education

  19. Great to see such a strong and diverse presence of @NFDI at #HMC_CON2025 !

    The range of contributions highlights the depth and collaborative spirit of the community. Many thanks to you and Stefanie and all other NDFI presenters for driving forward research data management in such a meaningful way.

    #NFDI #ResearchData #OpenScience #RDM #Metadata #MetadataMatters

  20. Great to see such a strong and diverse presence of @NFDI at #HMC_CON2025 !

    The range of contributions highlights the depth and collaborative spirit of the community. Many thanks to you and Stefanie and all other NDFI presenters for driving forward research data management in such a meaningful way.

    #NFDI #ResearchData #OpenScience #RDM #Metadata #MetadataMatters

  21. Great to see such a strong and diverse presence of @NFDI at #HMC_CON2025 !

    The range of contributions highlights the depth and collaborative spirit of the community. Many thanks to you and Stefanie and all other NDFI presenters for driving forward research data management in such a meaningful way.

    #NFDI #ResearchData #OpenScience #RDM #Metadata #MetadataMatters

  22. Great to see such a strong and diverse presence of @NFDI at #HMC_CON2025 !

    The range of contributions highlights the depth and collaborative spirit of the community. Many thanks to you and Stefanie and all other NDFI presenters for driving forward research data management in such a meaningful way.

    #NFDI #ResearchData #OpenScience #RDM #Metadata #MetadataMatters

  23. Great to see such a strong and diverse presence of @NFDI at #HMC_CON2025 !

    The range of contributions highlights the depth and collaborative spirit of the community. Many thanks to you and Stefanie and all other NDFI presenters for driving forward research data management in such a meaningful way.

    #NFDI #ResearchData #OpenScience #RDM #Metadata #MetadataMatters

  24. Protecting Science: TIB builds Dark Archive for arXiv

    diesen Beitrag auf Deutsch lesen

    Research and science are international, hence we are speaking of international scientific communities. A service such as arXiv might be operated by a US-based institution, Cornell University, but arXiv is being used by researchers worldwide, as, e.g., impressively evidenced by the submission statistics. Moreover, since the introduction of arXiv Membership in 2010, the funding of arXiv has been partially internationalised. TIB funds the German contribution, together with the Helmholtz Associaton of German Research Centres (HGF) und the Max Planck Society (MPG).

    What is arXiv?

    The platform arXiv.org is a freely accessible online archive for scientfic preprints, i.e. publications of scientific works that have not yet (fully) been peer-reviewed. The arXiv preprint service holds great importance for providing information to physics, mathematics, computer science, and neighbouring subjects. Via arXiv, researchers are able to access the latest research results, even before their actual publication in a quality-assured scientific journal. Since its founding in 1991 as the first online preprint service, arXiv serves as a model for the development of preprint services in other subjects (cf. Rzayeva et al. 2025, https://doi.org/10.31235/osf.io/xdwc4_v2).

    So when the Trump administration makes decisions that have fatal consequences for science and research in the US, the repercussions reach far beyond the Gulf of Mexico: Over the last days, reports are mounting in German media that attest to researchers not only fearing the loss of data , but also the loss of established information portals  such as PubMed.

    Research data under threat

    Initiatives such as ”Safeguarding Research and Culture” are scrambling to save threatened research data and websites for scientific communities and for posteriority. Contents under threat range from the social sciences (e.g. research on LBGTQIA+ topics) and medicine (e.g. vaccines) to the natural sciences (e.g. climate research).

    While it is research linked to political debates that is subject to the most blatant and egregious reprisals, in principle all research can be threatened by ”cost cutting” and restructuring measures. This is evidenced e.g. by the planned shutdown of the renowned 120 year old atomic spectroscopy group at the National Institute of Science and Technology (NIST).

    Decentral scientific infrastructures

    Unfortunately, a further escaltion of the already dismal curtailing of academic freedom in the US appears to be likely. Not at least due to the great importance of US institutions in the international academic system, these developments affect research infrastructures worldwide. As ”Safeguarding Research and Culture” are writing in their mission statement, this warrants a change of mind, among other things towards more decentralised and thus more resilient infrastructures.

    For arXiv a system which could have helped for at least some time had been in place until last year: In the early days of the internet, which were also the early days of arXiv, besides the main server arxiv.org there existed a network of arXiv mirror sites, distributed around the globe that allowed access to a copy of arXiv contents that were closer to the user location, geographically. A legendary example was the Augsburg arxiv mirror which often convinced with its shorter access and reply latencies.

    With years of technical progress, the differences in performance between the local mirrors (amongst others at the European Organization for Nuclear Research (CERN), at Los Alamos National Laboratory (LANL), in France, and Japan) and the main server arxiv.org flattened out. Resulting in more than 90 % of the traffic going via the main server and little usage of the mirror sites. Thus, in the view of the arXiv team, the expense for maintenance and updating of the mirrors was no longer matched by their ”utility and utilization”, as can be read in an arXiv blog entry under ”Attention arXiv users: arXiv mirrors to shut down September 15th, 2024”.

    After the arXiv system had been migrating to a completely cloud-centric architecture for its services over the last couple of years, those responsbile for arXiv came to the conclusion that

    “The arXiv mirror network served a role – acting as a backup for the corpus, allowing some degree of load distribution, and providing improved access for users who were geographically closer to a mirror – that is no longer necessary. arXiv now has multiple backups for the arXiv corpus in place, and the Fastly CDN (Content Delivery Network) that we use to deliver content provides excellent service throughout the world.“

    As a European institution, we have always taken a bit of a different view – and the recent developments, unfortunately, appear to confirm our reservations – and have always advocated for preserving the mirrors, while also looking for alternatives. Some processes turned out to be cumbersome and complicated, e.g. also due to legal constraints regarding licencing. (Open Access is not absolutely Open Access if authors have granted arXiv an exclusive right for provision.) Some others, we might be able to explore further.

    Why TIB is archiving arXiv data

    What we have implemented over the last few weeks, is to build a Dark Archive of arXiv contents:

    As a first step in building a Dark Archive, the rights clearance needs to be addressed, of course. Here, TIB had already commissioned a legal advisory survey back in 2016, in the context of a possible cooperation with arXiv.org. This included studying the licences used by arXiv, which broadly fall into the categories “arXiv.org licence” , “Creative Commons“, and “Public Domain“.

    While nothing stands in the way of archiving the data and metadata as such, the status of these rights would have to be explored in detail if they were to be made accessible in the context of a public-facing service. This is especially relevant for resources under arXiv licences, since this licence type over the course of the years underwent several versions. Between the years 1991 and 2003, users were even able to upload data objects without explicitly stating a licence.

    But before a user service can be even set up, the data need to be ingested into the TIB infrastructure. Here, arXiv itself offers several methods for full texts. Since both PDF and (La)Tex sources ought to be part of the TIB Dark Archive, we have opted for the download via Amazon S3. This is a possibility arXiv offers as a “requester pays buckets” method – meaning that TIB as the fetching entity covers the expenses arising with Amazon Web Services (AWS) https://info.arxiv.org/help/bulk_data_s3.html. For 2,686,172 fetched datasets with a data volume of just under 10 terabytes, the S3 transfer came to about 900 Euros.

    arXiv website

    Because metadata from arXiv have since a long time been used as a data source for the TIB portal, there was no need to establish a new workflow. Eventually, this also facilitates making the datasets accessible via the TIB-Portal. A possibility for this is, e.g., supplying the arXiv datasets in the TIB portal with a second download link in the background. In case the first download link pointing to the arXiv source is no longer accessible, the second link should come into play, pointing to the now existing copy at TIB. Users of the TIB-Portal could thus seemlessly access arXiv records, even in case of an outage of the main platform over at Cornell. As mentioned earlier, this accessibility is however contingent of the specific licences.

    Moreover, after the first complete transfer of the arXiv holdings, a process needs to be implemented which in regular intevals fetches new, additional arXiv records as well as versioning information for already existing records.

    “Building a Dark Archive is an expression of our longstanding commitment for a reliable, international academic provision, and as a partner of arXiv. Even though the Dark Archive today only works in the background, it is a key element in safeguarding digital research contents in the long term, because in case of a crisis, we could open the archive.”

    Dr Irina Sens, Deputy Director of TIB

    Dark Archive: Data stored, but not openly accessible

    The data are being stored, but if push comes to shove it would need some more steps to make them publicly available. Because a database service is much more than a mere backup copy of the data: Operating a productive user-facing service not only needs technical resources, but first and foremost a committed team which in the background takes care of diverse aspects such as quality assurance, content curation, or (technical) development.

    In the case of arXiv, there are not only the accessibility of the papers and the search functionality, the upload services for authors, and further technical services. Rather, it is the integration within the scientfic communities that is the heart of arXiv: Numerous researchers who volunteer to take on roles on various boards, for content moderation or as Volunteer Developers!

    #LizenzCCBY40INT #data #arXiv #DarkArchive

  25. Protecting Science: TIB builds Dark Archive for arXiv

    diesen Beitrag auf Deutsch lesen

    Research and science are international, hence we are speaking of international scientific communities. A service such as arXiv might be operated by a US-based institution, Cornell University, but arXiv is being used by researchers worldwide, as, e.g., impressively evidenced by the submission statistics. Moreover, since the introduction of arXiv Membership in 2010, the funding of arXiv has been partially internationalised. TIB funds the German contribution, together with the Helmholtz Associaton of German Research Centres (HGF) und the Max Planck Society (MPG).

    What is arXiv?

    The platform arXiv.org is a freely accessible online archive for scientfic preprints, i.e. publications of scientific works that have not yet (fully) been peer-reviewed. The arXiv preprint service holds great importance for providing information to physics, mathematics, computer science, and neighbouring subjects. Via arXiv, researchers are able to access the latest research results, even before their actual publication in a quality-assured scientific journal. Since its founding in 1991 as the first online preprint service, arXiv serves as a model for the development of preprint services in other subjects (cf. Rzayeva et al. 2025, https://doi.org/10.31235/osf.io/xdwc4_v2).

    So when the Trump administration makes decisions that have fatal consequences for science and research in the US, the repercussions reach far beyond the Gulf of Mexico: Over the last days, reports are mounting in German media that attest to researchers not only fearing the loss of data , but also the loss of established information portals  such as PubMed.

    Research data under threat

    Initiatives such as ”Safeguarding Research and Culture” are scrambling to save threatened research data and websites for scientific communities and for posteriority. Contents under threat range from the social sciences (e.g. research on LBGTQIA+ topics) and medicine (e.g. vaccines) to the natural sciences (e.g. climate research).

    While it is research linked to political debates that is subject to the most blatant and egregious reprisals, in principle all research can be threatened by ”cost cutting” and restructuring measures. This is evidenced e.g. by the planned shutdown of the renowned 120 year old atomic spectroscopy group at the National Institute of Science and Technology (NIST).

    Decentral scientific infrastructures

    Unfortunately, a further escaltion of the already dismal curtailing of academic freedom in the US appears to be likely. Not at least due to the great importance of US institutions in the international academic system, these developments affect research infrastructures worldwide. As ”Safeguarding Research and Culture” are writing in their mission statement, this warrants a change of mind, among other things towards more decentralised and thus more resilient infrastructures.

    For arXiv a system which could have helped for at least some time had been in place until last year: In the early days of the internet, which were also the early days of arXiv, besides the main server arxiv.org there existed a network of arXiv mirror sites, distributed around the globe that allowed access to a copy of arXiv contents that were closer to the user location, geographically. A legendary example was the Augsburg arxiv mirror which often convinced with its shorter access and reply latencies.

    With years of technical progress, the differences in performance between the local mirrors (amongst others at the European Organization for Nuclear Research (CERN), at Los Alamos National Laboratory (LANL), in France, and Japan) and the main server arxiv.org flattened out. Resulting in more than 90 % of the traffic going via the main server and little usage of the mirror sites. Thus, in the view of the arXiv team, the expense for maintenance and updating of the mirrors was no longer matched by their ”utility and utilization”, as can be read in an arXiv blog entry under ”Attention arXiv users: arXiv mirrors to shut down September 15th, 2024”.

    After the arXiv system had been migrating to a completely cloud-centric architecture for its services over the last couple of years, those responsbile for arXiv came to the conclusion that

    “The arXiv mirror network served a role – acting as a backup for the corpus, allowing some degree of load distribution, and providing improved access for users who were geographically closer to a mirror – that is no longer necessary. arXiv now has multiple backups for the arXiv corpus in place, and the Fastly CDN (Content Delivery Network) that we use to deliver content provides excellent service throughout the world.“

    As a European institution, we have always taken a bit of a different view – and the recent developments, unfortunately, appear to confirm our reservations – and have always advocated for preserving the mirrors, while also looking for alternatives. Some processes turned out to be cumbersome and complicated, e.g. also due to legal constraints regarding licencing. (Open Access is not absolutely Open Access if authors have granted arXiv an exclusive right for provision.) Some others, we might be able to explore further.

    Why TIB is archiving arXiv data

    What we have implemented over the last few weeks, is to build a Dark Archive of arXiv contents:

    As a first step in building a Dark Archive, the rights clearance needs to be addressed, of course. Here, TIB had already commissioned a legal advisory survey back in 2016, in the context of a possible cooperation with arXiv.org. This included studying the licences used by arXiv, which broadly fall into the categories “arXiv.org licence” , “Creative Commons“, and “Public Domain“.

    While nothing stands in the way of archiving the data and metadata as such, the status of these rights would have to be explored in detail if they were to be made accessible in the context of a public-facing service. This is especially relevant for resources under arXiv licences, since this licence type over the course of the years underwent several versions. Between the years 1991 and 2003, users were even able to upload data objects without explicitly stating a licence.

    But before a user service can be even set up, the data need to be ingested into the TIB infrastructure. Here, arXiv itself offers several methods for full texts. Since both PDF and (La)Tex sources ought to be part of the TIB Dark Archive, we have opted for the download via Amazon S3. This is a possibility arXiv offers as a “requester pays buckets” method – meaning that TIB as the fetching entity covers the expenses arising with Amazon Web Services (AWS) https://info.arxiv.org/help/bulk_data_s3.html. For 2,686,172 fetched datasets with a data volume of just under 10 terabytes, the S3 transfer came to about 900 Euros.

    arXiv website

    Because metadata from arXiv have since a long time been used as a data source for the TIB portal, there was no need to establish a new workflow. Eventually, this also facilitates making the datasets accessible via the TIB-Portal. A possibility for this is, e.g., supplying the arXiv datasets in the TIB portal with a second download link in the background. In case the first download link pointing to the arXiv source is no longer accessible, the second link should come into play, pointing to the now existing copy at TIB. Users of the TIB-Portal could thus seemlessly access arXiv records, even in case of an outage of the main platform over at Cornell. As mentioned earlier, this accessibility is however contingent of the specific licences.

    Moreover, after the first complete transfer of the arXiv holdings, a process needs to be implemented which in regular intevals fetches new, additional arXiv records as well as versioning information for already existing records.

    “Building a Dark Archive is an expression of our longstanding commitment for a reliable, international academic provision, and as a partner of arXiv. Even though the Dark Archive today only works in the background, it is a key element in safeguarding digital research contents in the long term, because in case of a crisis, we could open the archive.”

    Dr Irina Sens, Deputy Director of TIB

    Dark Archive: Data stored, but not openly accessible

    The data are being stored, but if push comes to shove it would need some more steps to make them publicly available. Because a database service is much more than a mere backup copy of the data: Operating a productive user-facing service not only needs technical resources, but first and foremost a committed team which in the background takes care of diverse aspects such as quality assurance, content curation, or (technical) development.

    In the case of arXiv, there are not only the accessibility of the papers and the search functionality, the upload services for authors, and further technical services. Rather, it is the integration within the scientfic communities that is the heart of arXiv: Numerous researchers who volunteer to take on roles on various boards, for content moderation or as Volunteer Developers!

    #LizenzCCBY40INT #data #arXiv #DarkArchive

  26. Protecting Science: TIB builds Dark Archive for arXiv

    diesen Beitrag auf Deutsch lesen

    Research and science are international, hence we are speaking of international scientific communities. A service such as arXiv might be operated by a US-based institution, Cornell University, but arXiv is being used by researchers worldwide, as, e.g., impressively evidenced by the submission statistics. Moreover, since the introduction of arXiv Membership in 2010, the funding of arXiv has been partially internationalised. TIB funds the German contribution, together with the Helmholtz Associaton of German Research Centres (HGF) und the Max Planck Society (MPG).

    What is arXiv?

    The platform arXiv.org is a freely accessible online archive for scientfic preprints, i.e. publications of scientific works that have not yet (fully) been peer-reviewed. The arXiv preprint service holds great importance for providing information to physics, mathematics, computer science, and neighbouring subjects. Via arXiv, researchers are able to access the latest research results, even before their actual publication in a quality-assured scientific journal. Since its founding in 1991 as the first online preprint service, arXiv serves as a model for the development of preprint services in other subjects (cf. Rzayeva et al. 2025, https://doi.org/10.31235/osf.io/xdwc4_v2).

    So when the Trump administration makes decisions that have fatal consequences for science and research in the US, the repercussions reach far beyond the Gulf of Mexico: Over the last days, reports are mounting in German media that attest to researchers not only fearing the loss of data , but also the loss of established information portals  such as PubMed.

    Research data under threat

    Initiatives such as ”Safeguarding Research and Culture” are scrambling to save threatened research data and websites for scientific communities and for posteriority. Contents under threat range from the social sciences (e.g. research on LBGTQIA+ topics) and medicine (e.g. vaccines) to the natural sciences (e.g. climate research).

    While it is research linked to political debates that is subject to the most blatant and egregious reprisals, in principle all research can be threatened by ”cost cutting” and restructuring measures. This is evidenced e.g. by the planned shutdown of the renowned 120 year old atomic spectroscopy group at the National Institute of Science and Technology (NIST).

    Decentral scientific infrastructures

    Unfortunately, a further escaltion of the already dismal curtailing of academic freedom in the US appears to be likely. Not at least due to the great importance of US institutions in the international academic system, these developments affect research infrastructures worldwide. As ”Safeguarding Research and Culture” are writing in their mission statement, this warrants a change of mind, among other things towards more decentralised and thus more resilient infrastructures.

    For arXiv a system which could have helped for at least some time had been in place until last year: In the early days of the internet, which were also the early days of arXiv, besides the main server arxiv.org there existed a network of arXiv mirror sites, distributed around the globe that allowed access to a copy of arXiv contents that were closer to the user location, geographically. A legendary example was the Augsburg arxiv mirror which often convinced with its shorter access and reply latencies.

    With years of technical progress, the differences in performance between the local mirrors (amongst others at the European Organization for Nuclear Research (CERN), at Los Alamos National Laboratory (LANL), in France, and Japan) and the main server arxiv.org flattened out. Resulting in more than 90 % of the traffic going via the main server and little usage of the mirror sites. Thus, in the view of the arXiv team, the expense for maintenance and updating of the mirrors was no longer matched by their ”utility and utilization”, as can be read in an arXiv blog entry under ”Attention arXiv users: arXiv mirrors to shut down September 15th, 2024”.

    After the arXiv system had been migrating to a completely cloud-centric architecture for its services over the last couple of years, those responsbile for arXiv came to the conclusion that

    “The arXiv mirror network served a role – acting as a backup for the corpus, allowing some degree of load distribution, and providing improved access for users who were geographically closer to a mirror – that is no longer necessary. arXiv now has multiple backups for the arXiv corpus in place, and the Fastly CDN (Content Delivery Network) that we use to deliver content provides excellent service throughout the world.“

    As a European institution, we have always taken a bit of a different view – and the recent developments, unfortunately, appear to confirm our reservations – and have always advocated for preserving the mirrors, while also looking for alternatives. Some processes turned out to be cumbersome and complicated, e.g. also due to legal constraints regarding licencing. (Open Access is not absolutely Open Access if authors have granted arXiv an exclusive right for provision.) Some others, we might be able to explore further.

    Why TIB is archiving arXiv data

    What we have implemented over the last few weeks, is to build a Dark Archive of arXiv contents:

    As a first step in building a Dark Archive, the rights clearance needs to be addressed, of course. Here, TIB had already commissioned a legal advisory survey back in 2016, in the context of a possible cooperation with arXiv.org. This included studying the licences used by arXiv, which broadly fall into the categories “arXiv.org licence” , “Creative Commons“, and “Public Domain“.

    While nothing stands in the way of archiving the data and metadata as such, the status of these rights would have to be explored in detail if they were to be made accessible in the context of a public-facing service. This is especially relevant for resources under arXiv licences, since this licence type over the course of the years underwent several versions. Between the years 1991 and 2003, users were even able to upload data objects without explicitly stating a licence.

    But before a user service can be even set up, the data need to be ingested into the TIB infrastructure. Here, arXiv itself offers several methods for full texts. Since both PDF and (La)Tex sources ought to be part of the TIB Dark Archive, we have opted for the download via Amazon S3. This is a possibility arXiv offers as a “requester pays buckets” method – meaning that TIB as the fetching entity covers the expenses arising with Amazon Web Services (AWS) https://info.arxiv.org/help/bulk_data_s3.html. For 2,686,172 fetched datasets with a data volume of just under 10 terabytes, the S3 transfer came to about 900 Euros.

    arXiv website

    Because metadata from arXiv have since a long time been used as a data source for the TIB portal, there was no need to establish a new workflow. Eventually, this also facilitates making the datasets accessible via the TIB-Portal. A possibility for this is, e.g., supplying the arXiv datasets in the TIB portal with a second download link in the background. In case the first download link pointing to the arXiv source is no longer accessible, the second link should come into play, pointing to the now existing copy at TIB. Users of the TIB-Portal could thus seemlessly access arXiv records, even in case of an outage of the main platform over at Cornell. As mentioned earlier, this accessibility is however contingent of the specific licences.

    Moreover, after the first complete transfer of the arXiv holdings, a process needs to be implemented which in regular intevals fetches new, additional arXiv records as well as versioning information for already existing records.

    “Building a Dark Archive is an expression of our longstanding commitment for a reliable, international academic provision, and as a partner of arXiv. Even though the Dark Archive today only works in the background, it is a key element in safeguarding digital research contents in the long term, because in case of a crisis, we could open the archive.”

    Dr Irina Sens, Deputy Director of TIB

    Dark Archive: Data stored, but not openly accessible

    The data are being stored, but if push comes to shove it would need some more steps to make them publicly available. Because a database service is much more than a mere backup copy of the data: Operating a productive user-facing service not only needs technical resources, but first and foremost a committed team which in the background takes care of diverse aspects such as quality assurance, content curation, or (technical) development.

    In the case of arXiv, there are not only the accessibility of the papers and the search functionality, the upload services for authors, and further technical services. Rather, it is the integration within the scientfic communities that is the heart of arXiv: Numerous researchers who volunteer to take on roles on various boards, for content moderation or as Volunteer Developers!

    #arXiv #DarkArchive #data #LizenzCCBY40INT

  27. Protecting Science: TIB builds Dark Archive for arXiv

    diesen Beitrag auf Deutsch lesen

    Research and science are international, hence we are speaking of international scientific communities. A service such as arXiv might be operated by a US-based institution, Cornell University, but arXiv is being used by researchers worldwide, as, e.g., impressively evidenced by the submission statistics. Moreover, since the introduction of arXiv Membership in 2010, the funding of arXiv has been partially internationalised. TIB funds the German contribution, together with the Helmholtz Associaton of German Research Centres (HGF) und the Max Planck Society (MPG).

    What is arXiv?

    The platform arXiv.org is a freely accessible online archive for scientfic preprints, i.e. publications of scientific works that have not yet (fully) been peer-reviewed. The arXiv preprint service holds great importance for providing information to physics, mathematics, computer science, and neighbouring subjects. Via arXiv, researchers are able to access the latest research results, even before their actual publication in a quality-assured scientific journal. Since its founding in 1991 as the first online preprint service, arXiv serves as a model for the development of preprint services in other subjects (cf. Rzayeva et al. 2025, https://doi.org/10.31235/osf.io/xdwc4_v2).

    So when the Trump administration makes decisions that have fatal consequences for science and research in the US, the repercussions reach far beyond the Gulf of Mexico: Over the last days, reports are mounting in German media that attest to researchers not only fearing the loss of data , but also the loss of established information portals  such as PubMed.

    Research data under threat

    Initiatives such as ”Safeguarding Research and Culture” are scrambling to save threatened research data and websites for scientific communities and for posteriority. Contents under threat range from the social sciences (e.g. research on LBGTQIA+ topics) and medicine (e.g. vaccines) to the natural sciences (e.g. climate research).

    While it is research linked to political debates that is subject to the most blatant and egregious reprisals, in principle all research can be threatened by ”cost cutting” and restructuring measures. This is evidenced e.g. by the planned shutdown of the renowned 120 year old atomic spectroscopy group at the National Institute of Science and Technology (NIST).

    Decentral scientific infrastructures

    Unfortunately, a further escaltion of the already dismal curtailing of academic freedom in the US appears to be likely. Not at least due to the great importance of US institutions in the international academic system, these developments affect research infrastructures worldwide. As ”Safeguarding Research and Culture” are writing in their mission statement, this warrants a change of mind, among other things towards more decentralised and thus more resilient infrastructures.

    For arXiv a system which could have helped for at least some time had been in place until last year: In the early days of the internet, which were also the early days of arXiv, besides the main server arxiv.org there existed a network of arXiv mirror sites, distributed around the globe that allowed access to a copy of arXiv contents that were closer to the user location, geographically. A legendary example was the Augsburg arxiv mirror which often convinced with its shorter access and reply latencies.

    With years of technical progress, the differences in performance between the local mirrors (amongst others at the European Organization for Nuclear Research (CERN), at Los Alamos National Laboratory (LANL), in France, and Japan) and the main server arxiv.org flattened out. Resulting in more than 90 % of the traffic going via the main server and little usage of the mirror sites. Thus, in the view of the arXiv team, the expense for maintenance and updating of the mirrors was no longer matched by their ”utility and utilization”, as can be read in an arXiv blog entry under ”Attention arXiv users: arXiv mirrors to shut down September 15th, 2024”.

    After the arXiv system had been migrating to a completely cloud-centric architecture for its services over the last couple of years, those responsbile for arXiv came to the conclusion that

    “The arXiv mirror network served a role – acting as a backup for the corpus, allowing some degree of load distribution, and providing improved access for users who were geographically closer to a mirror – that is no longer necessary. arXiv now has multiple backups for the arXiv corpus in place, and the Fastly CDN (Content Delivery Network) that we use to deliver content provides excellent service throughout the world.“

    As a European institution, we have always taken a bit of a different view – and the recent developments, unfortunately, appear to confirm our reservations – and have always advocated for preserving the mirrors, while also looking for alternatives. Some processes turned out to be cumbersome and complicated, e.g. also due to legal constraints regarding licencing. (Open Access is not absolutely Open Access if authors have granted arXiv an exclusive right for provision.) Some others, we might be able to explore further.

    Why TIB is archiving arXiv data

    What we have implemented over the last few weeks, is to build a Dark Archive of arXiv contents:

    As a first step in building a Dark Archive, the rights clearance needs to be addressed, of course. Here, TIB had already commissioned a legal advisory survey back in 2016, in the context of a possible cooperation with arXiv.org. This included studying the licences used by arXiv, which broadly fall into the categories “arXiv.org licence” , “Creative Commons“, and “Public Domain“.

    While nothing stands in the way of archiving the data and metadata as such, the status of these rights would have to be explored in detail if they were to be made accessible in the context of a public-facing service. This is especially relevant for resources under arXiv licences, since this licence type over the course of the years underwent several versions. Between the years 1991 and 2003, users were even able to upload data objects without explicitly stating a licence.

    But before a user service can be even set up, the data need to be ingested into the TIB infrastructure. Here, arXiv itself offers several methods for full texts. Since both PDF and (La)Tex sources ought to be part of the TIB Dark Archive, we have opted for the download via Amazon S3. This is a possibility arXiv offers as a “requester pays buckets” method – meaning that TIB as the fetching entity covers the expenses arising with Amazon Web Services (AWS) https://info.arxiv.org/help/bulk_data_s3.html. For 2,686,172 fetched datasets with a data volume of just under 10 terabytes, the S3 transfer came to about 900 Euros.

    arXiv website

    Because metadata from arXiv have since a long time been used as a data source for the TIB portal, there was no need to establish a new workflow. Eventually, this also facilitates making the datasets accessible via the TIB-Portal. A possibility for this is, e.g., supplying the arXiv datasets in the TIB portal with a second download link in the background. In case the first download link pointing to the arXiv source is no longer accessible, the second link should come into play, pointing to the now existing copy at TIB. Users of the TIB-Portal could thus seemlessly access arXiv records, even in case of an outage of the main platform over at Cornell. As mentioned earlier, this accessibility is however contingent of the specific licences.

    Moreover, after the first complete transfer of the arXiv holdings, a process needs to be implemented which in regular intevals fetches new, additional arXiv records as well as versioning information for already existing records.

    “Building a Dark Archive is an expression of our longstanding commitment for a reliable, international academic provision, and as a partner of arXiv. Even though the Dark Archive today only works in the background, it is a key element in safeguarding digital research contents in the long term, because in case of a crisis, we could open the archive.”

    Dr Irina Sens, Deputy Director of TIB

    Dark Archive: Data stored, but not openly accessible

    The data are being stored, but if push comes to shove it would need some more steps to make them publicly available. Because a database service is much more than a mere backup copy of the data: Operating a productive user-facing service not only needs technical resources, but first and foremost a committed team which in the background takes care of diverse aspects such as quality assurance, content curation, or (technical) development.

    In the case of arXiv, there are not only the accessibility of the papers and the search functionality, the upload services for authors, and further technical services. Rather, it is the integration within the scientfic communities that is the heart of arXiv: Numerous researchers who volunteer to take on roles on various boards, for content moderation or as Volunteer Developers!

    #LizenzCCBY40INT #data #arXiv #DarkArchive

  28. For a while now I've been using #Forecastie to display open weather data from;

    openweathermap.org/

    It was, at the time, the best weather app I could find on F-Droid. But while I was Fossify-ing my Simple apps, I decided to see what's available now. I'm quite impressed with Breezy Weather (forked from Geometric Weather);

    f-droid.org/packages/org.breez

    It seems to be able to use a range of open data sources, but the default appears to be;

    open-meteo.com/

    #weather #OpenWeather #OpenMeteo

  29. For a while now I've been using #Forecastie to display open weather data from;

    openweathermap.org/

    It was, at the time, the best weather app I could find on F-Droid. But while I was Fossify-ing my Simple apps, I decided to see what's available now. I'm quite impressed with Breezy Weather (forked from Geometric Weather);

    f-droid.org/packages/org.breez

    It seems to be able to use a range of open data sources, but the default appears to be;

    open-meteo.com/

    #weather #OpenWeather #OpenMeteo