AI Writing Tools

Explore the best AI Writing Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • TU Me

    TU Me

    TU (formerly TU Me) is a digital platform developed by Telefónica and operated through its subsidiary Telefónica Innovación Digital. Initially launched in 2012 as a messaging app under the name TU Me, the brand was later revived in 2024 to designate a new suite of digital products focused on privacy, cybersecurity, and digital identity. == TU Me (2012–2014) == TU Me was a free mobile application released by Telefónica in May 2012. It allowed users to make voice calls, send texts, share photos and locations, and store conversation history in the cloud. The app was available for iOS and Android platforms, positioned as an alternative to services like WhatsApp and Viber. Despite early interest, TU Me was discontinued a few years later and removed from major app stores. Telefónica did not continue development of this version beyond its initial release cycle. == TU (2024–present) == In January 2024, Telefónica relaunched the brand TU through its technology subsidiary Telefónica Innovación Digital. Unlike its predecessor, the new TU is not a messaging app but a digital product platform offering solutions in cybersecurity, identity management, and cryptographic technology. The project includes a range of services built with technologies such as artificial intelligence, blockchain, and post-quantum cryptography. It operates independently from Movistar and targets both individual users and businesses. Notable products include: Latch: a digital access control system for securing user accounts. VerifAI: an AI-based tool for detecting manipulated media (images, audio, video). Metashield: software to identify and remove hidden metadata in documents. Wallet: a digital wallet for managing crypto-assets. Quantum Drop: encrypted file transfer system using post-quantum technology. Quantum Encryption: a security tool for IoT and private networks. Gallery: a blockchain-based digital art marketplace.

    Read more →
  • Technical data management system

    Technical data management system

    A technical data management system (TDMS) is a document management system (DMS) pertaining to the management of technical and engineering drawings and documents. Often the data are contained in 'records' of various forms, such as on paper, microfilms or digital media. Hence technical data management is also concerned with record management involving technical data. Technical document management systems are used within large organisations with large scale projects involving engineering. For example, a TDMS can be used for integrated steel plants (ISP), automobile factories, aero-space facilities, infrastructure companies, city corporations, research organisations, etc. In such organisations, technical archives or technical documentation centres are created as central facilities for effective management of technical data and records. TDMS functions are similar to that of conventional archive functions in concepts, except that the archived materials in this case are essentially engineering drawings, survey maps, technical specifications, plant and equipment data sheets, feasibility reports, project reports, operation and maintenance manuals, standards, etc. Document registration, indexing, repository management, reprography, etc. are parts of TDMS. Various kinds of sophisticated technologies such as document scanners, microfilming and digitization camera units, wide format printers, digital plotters, software, etc. are available, making TDMS functions an easier process than previous times. == Constituents of a technical data management system == Technical data refers to both scientific and technical information recorded and presented in any form or manner (excluding financial and management information). A Technical Data Management System is created within an organisation for archiving and sharing information such as technical specifications, datasheets and drawings. Similar to other types of data management system, a Technical Data Management System consists of the 4 crucial constituents mentioned below. === Data planning === Data plans (long-term or short-term) are constructed as the first essential step of a proper and complete TDMS. It is created to ultimately help with the 3 other constituents, data acquisition, data management and data sharing. A proper data plan should not exceed 2 pages and should address the following basics: Types of data (samples, experiment results, reports, drawings, etc.) and metadata (data that summarizes and describes other data. In this case, it refers to details such as sample sizes, experiment conditions and procedures, dates of reports, explanations of drawings, etc.) Means of researches and collections of data (field works, experiments in production lines, etc.) Costs of researches Policies for access, sharing (re-use within the organisation and re-distribution to the public) Proposals for archiving data and maintaining access to it === Data acquisition === Raw data is collected from primary sites of the organisations through the use of modern technologies. Please reference the table below for examples. The data collected is then transferred to technical data centres for data management. === Data management === After data acquisition, data is sorted out, whilst useful data is archived, unwanted data is disposed. When managing and archiving data, the features below of the data are considered. Names, labels, values and descriptions for variables and records. (In the case of TDMS, one example is names of equipments on an equipment datasheet) Derived data from the original data, with code, algorithm or command file used to create them. (In the case of TDMS, one example is an expectation report derived from the analysis of an equipment datasheet) Metadata associates with the data being archived === Data sharing === Archived and managed data are accessible to rightful entities. A proper and complete TDMS should share data to a suitable extent, under suitable security, in order to achieve optimal usage of data within the organisation. It aims for easy access when reused by other researchers and hence it enhances other research processes. Data is often referred in other tests and technical specifications, where new analysis is generated, managed and archived again. As a result, data is flowing within the organisation under effective management through the use of TDMS. == Advantages and disadvantages of usage of technical data management systems == There are strengths and weakness when using technical data management systems (TDMS) to archive data. Some of the advantages and disadvantages are listed below. === Advantages === ==== 1. Faster and easier data management ==== Since TDMS is integrated into the organisation's systems, whenever workers develop data files (SolidWorks, AutoCAD, Microsoft Word, etc.), they can also archive and manage data, linking what they need to their current work, at the same time they can also update the archives with useful data. This speeds up working processes and makes them more efficient. ==== 2. Increased security ==== All data files are centralized, hence internal and external data leakages are less likely to happen, and the data flow is more closely monitored. As a result, data in the organisation is more secured. ==== 3. Increased collaboration within the organisation ==== Since the data files are centralized and the data flow within the organisation increases, researchers and workers within the organisation are able to work on joint projects. More complex tasks can be performed for higher yields. ==== 4. Compatible to various formats of data ==== TDMS is compatible to many formats of data, from basic data like Microsoft Words to complex data like voice data. This enhances the quality of the management of data archived. === Disadvantages === ==== 1. Higher financial costs ==== Implementing TDMS into the organisation's systems involves monetary costs. Maintenance costs certain amount of human resources and money as well. These resources involve opportunity costs as they can be utilized in other aspects. ==== 2. Lower stability ==== Since TDMS manages and centralizes all the data the organisation processes, it links the working processes within the whole organisation together. It also increases the vulnerability of the organisation data network. If TDMS is not stable enough or when it is exposed to hacker and virus attacks, the organisation's data flow might shut down completely, affecting the work in an organisation-wide scale and leading to a lower stability as results. == Comparison between traditional data management approaches and technical data management systems == Test engineers and researchers are facing great challenges in turning complex test results and simulation data into usable information for higher yields of firms. These challenges are listed below. Increase in complication of designs Reduced in time and budgets available Higher quality is demanded === Traditional data management approaches === Many organisations are still applying the conventional file management systems, due to the difficulty in building a proper and complete archives for data management. The first approach is the simple file-folder system. This costs the problem of ineffectiveness as workers and researchers have to manually go through numerous layers of systems and files for the target data. Moreover, the target data may contain files with different formats and these files may not be stored in the same machine. These files are also easily lost if renamed or moved to another location. The second approach is conventional databases such as Oracle. These databases are capable of enabling easy search and access of data. However, a great drawback is that huge effort for preparing and modeling the data is required. For large-scale projects, huge monetary costs are induced, and extra IT human resources must be employed for constant handling, expanding and maintaining the inflexible system, which is custom for specific tasks, instead of all tasks. In the long-term, it is not cost-effective. === Technical data management systems (TDMS) === TDMS is developed based on 3 principles, flexible and organized file storage, self-scaling hybrid data index, and an interactive post-processing environment. The system in practical, mainly consists of 3 components, data files with essential and relevant Metadata, data finders for organizing and managing data regardless of files formats, and, a software of searching, analyzing and reporting. With metadata attached to original data files, the data finder can identify different related data files during searches, even if they are in different file formats. TDMS hence allows researchers to search for data like browsing the Internet. Last but not least, it can adapt to changes and update itself according to the changes, unlike databases. == Comparison between strong information systems and weak information systems == Complex organizations may need large amounts

    Read more →
  • Collective operation

    Collective operation

    Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations. A realization of the collective operations is provided by the Message Passing Interface (MPI). == Definitions == In all asymptotic runtime functions, we denote the latency α {\displaystyle \alpha } (or startup time per message, independent of message size), the communication cost per word β {\displaystyle \beta } , the number of processing units p {\displaystyle p} and the input size per node n {\displaystyle n} . In cases where we have initial messages on more than one node we assume that all local messages are of the same size. To address individual processing units we use p i ∈ { p 0 , p 1 , … , p p − 1 } {\displaystyle p_{i}\in \{p_{0},p_{1},\dots ,p_{p-1}\}} . If we do not have an equal distribution, i.e. node p i {\displaystyle p_{i}} has a message of size n i {\displaystyle n_{i}} , we get an upper bound for the runtime by setting n = max ( n 0 , n 1 , … , n p − 1 ) {\displaystyle n=\max(n_{0},n_{1},\dots ,n_{p-1})} . A distributed memory model is assumed. The concepts are similar for the shared memory model. However, shared memory systems can provide hardware support for some operations like broadcast (§ Broadcast) for example, which allows convenient concurrent read. Thus, new algorithmic possibilities can become available. == Broadcast == The broadcast pattern is used to distribute data from one processing unit to all processing units, which is often needed in SPMD parallel programs to dispense input or global values. Broadcast can be interpreted as an inverse version of the reduce pattern (§ Reduce). Initially only root r {\displaystyle r} with i d {\displaystyle id} 0 {\displaystyle 0} stores message m {\displaystyle m} . During broadcast m {\displaystyle m} is sent to the remaining processing units, so that eventually m {\displaystyle m} is available to all processing units. Since an implementation by means of a sequential for-loop with p − 1 {\displaystyle p-1} iterations becomes a bottleneck, divide-and-conquer approaches are common. One possibility is to utilize a binomial tree structure with the requirement that p {\displaystyle p} has to be a power of two. When a processing unit is responsible for sending m {\displaystyle m} to processing units i . . j {\displaystyle i..j} , it sends m {\displaystyle m} to processing unit ⌈ ( i + j ) / 2 ⌉ {\displaystyle \left\lceil (i+j)/2\right\rceil } and delegates responsibility for the processing units ⌈ ( i + j ) / 2 ⌉ . . j {\displaystyle \left\lceil (i+j)/2\right\rceil ..j} to it, while its own responsibility is cut down to i . . ⌈ ( i + j ) / 2 ⌉ − 1 {\displaystyle i..\left\lceil (i+j)/2\right\rceil -1} . Binomial trees have a problem with long messages m {\displaystyle m} . The receiving unit of m {\displaystyle m} can only propagate the message to other units, after it received the whole message. In the meantime, the communication network is not utilized. Therefore pipelining on binary trees is used, where m {\displaystyle m} is split into an array of k {\displaystyle k} packets of size ⌈ n / k ⌉ {\displaystyle \left\lceil n/k\right\rceil } . The packets are then broadcast one after another, so that data is distributed fast in the communication network. Pipelined broadcast on balanced binary tree is possible in O ( α log ⁡ p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , whereas for the non-pipelined case it takes O ( ( α + β n ) log ⁡ p ) {\displaystyle {\mathcal {O}}((\alpha +\beta n)\log p)} cost. == Reduce == The reduce pattern is used to collect data or partial results from different processing units and to combine them into a global result by a chosen operator. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by ⊗ {\displaystyle \otimes } and the result is eventually stored on p 0 {\displaystyle p_{0}} . The reduction operator ⊗ {\displaystyle \otimes } must be associative at least. Some algorithms require a commutative operator with a neutral element. Operators like s u m {\displaystyle sum} , m i n {\displaystyle min} , m a x {\displaystyle max} are common. Implementation considerations are similar to broadcast (§ Broadcast). For pipelining on binary trees the message must be representable as a vector of smaller object for component-wise reduction. Pipelined reduce on a balanced binary tree is possible in O ( α log ⁡ p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} . == All-Reduce == The all-reduce pattern (also called allreduce) is used if the result of a reduce operation (§ Reduce) must be distributed to all processing units. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by an operator ⊗ {\displaystyle \otimes } and the result is eventually stored on all p i {\displaystyle p_{i}} . Analog to the reduce operation, the operator ⊗ {\displaystyle \otimes } must be at least associative. All-reduce can be interpreted as a reduce operation with a subsequent broadcast (§ Broadcast). For long messages a corresponding implementation is suitable, whereas for short messages, the latency can be reduced by using a hypercube (Hypercube (communication pattern) § All-Gather/ All-Reduce) topology, if p {\displaystyle p} is a power of two. All-reduce can also be implemented with a butterfly algorithm and achieve optimal latency and bandwidth. All-reduce is possible in O ( α log ⁡ p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , since reduce and broadcast are possible in O ( α log ⁡ p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} with pipelining on balanced binary trees. All-reduce implemented with a butterfly algorithm achieves the same asymptotic runtime. == Prefix-Sum/Scan == The prefix-sum or scan operation is used to collect data or partial results from different processing units and to compute intermediate results by an operator, which are stored on those processing units. It can be seen as a generalization of the reduce operation (§ Reduce). Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} . The operator ⊗ {\displaystyle \otimes } must be at least associative, whereas some algorithms require also a commutative operator and a neutral element. Common operators are s u m {\displaystyle sum} , m i n {\displaystyle min} and m a x {\displaystyle max} . Eventually processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ <= i {\displaystyle \otimes _{i'<=i}} m i ′ {\displaystyle m_{i'}} . In the case of the so-called exclusive prefix sum, processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ < i {\displaystyle \otimes _{i' Read more →

  • Information access

    Information access

    Information access is the freedom or ability to identify, obtain and make use of database or information effectively. There are various research efforts in information access for which the objective is to simplify and make it more effective for human users to access and further process large and unwieldy amounts of data and information. == Technology == Several technologies applicable to the general area are Information Retrieval, Text Mining, Machine Translation, and Text Categorisation. During discussions on free access to information as well as on information policy, information access is understood as concerning the insurance of free and closed access to information. Information access covers many issues including copyright, open source, privacy, and security. == Groups == Groups such as the American Library Association, the American Association of Law Libraries, Ralph Nader's Taxpayers Assets Project have advocated for free access to legal information. The vendor neutral citation movement in the legal field is working to ensure that courts will accept citations from cases on the web which do not have the traditional (copyrighted) page numbers from the West Publishing company. There is a worldwide Free Access to Law Movement which advocates free access to legal information. The Wired article "Who Owns The Law" is an introduction to the access to legal information issue. Postsecondary organizations such as K-12 work to share information. They feel it is a legal and moral obligation to provide access (including to people with disabilities or impairments) to information through the services and programs they offer. Some effects of charging for information access, such as literature searches for physicians, is studied in the article "Fee or Free: The Effect of Charging on Information Demand". In this study, a $5 charge resulted in a 77% decrease in searches.

    Read more →
  • Boyfriend Maker

    Boyfriend Maker

    Boyfriend Maker was a dating sim, romance chatbot smartphone app for iOS (iPhone) and Android devices, developed by Japanese studio 36 You Games (styled as 36You) and distributed under the freemium business model. Boyfriend Maker incorporated advanced artificial intelligence chat technology a decade before products such as ChatGPT. According to the developer's website, Boyfriend Maker is an "app that lets you interact and chat with quirky virtual boyfriends". While each virtual boyfriend has certain unique characteristics, the various instances of the boyfriend are powered by a chat engine, that (at least within a language and market) can utilise vocabulary and knowledge acquired in a chat with one user in subsequent chats with other users. == Gameplay == Users gain experience points and in-game coins. Users can customize their virtual boyfriend's appearance by selecting items such as hair, clothing, face, and a necklace. == Apple delisting and reintroduction == In late November 2012, the original iOS Boyfriend Maker app was delisted from the Apple App Store due to "ribald" chat, according to the New York Times. Boyfriend Maker was removed by Apple due to "reports of references to violent sexual acts and pedophilia". Boyfriend Maker had an age rating of 4+, even though the chat bot "responds with often strange and explicit text unsuitable for young children". User-posted chat excerpts indicate that the virtual boyfriend would sometimes transition abruptly to sexual chat in response to a seemingly innocent question. In one user-posted example, in response to the question, "what kind of wedding cake will we have" the boyfriend responds, "a good sex ima be on top of u u gonna ride oon me bitin the pillow gurrl ima fuck da shit out of u". The developer's use of the SimSimi-created third-party chat engine may be responsible for the sexual text. As the virtual boyfriend converses with human users, the SimSimi chat engine acquires vocabulary from users of the game and applies this "learned" vocabulary in chats with other users. The chat engine might also employ lines harvested from human-human chat logs, song lyrics, movies or TV shows. In April 2013, a detuned and presumably tamer version of the app, titled Boyfriend Plus, was permitted on Apple's App Store.

    Read more →
  • SciDB

    SciDB

    SciDB is a column-oriented database management system (DBMS) designed for multidimensional data management and analytics common to scientific, geospatial, financial, and industrial applications. It is developed by Paradigm4 and co-created by Michael Stonebraker. == History == Stonebraker claims that arrays are 100 times faster in SciDB than in a relational DBMS on a class of problems. It is swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions unlike the conventionally widely used relational database management system model, in which each relation supports only one dimension of records. A 2011 conference presentation on SciDB promoted it as "not Hadoop". Marilyn Matz became chief executive Paradigm4 in 2014.

    Read more →
  • Virtual facility

    Virtual facility

    A Virtual Facility (VF) is a highly realistic digital representation of a data center, used to model all relevant aspects of a physical data center with a high degree of precision. The term "virtual" in Virtual Facility refers to its use of virtual reality, rather than the abstraction of computer resources as seen in platform virtualization. The VF mirrors the characteristics of a physical facility over time and allows for detailed analysis and modeling. == VF Model features == A standard VF model includes: Three-dimensional physical facility layout Network connectivity of facility equipment Full inventory of facility equipment, including electronics and electrical systems such as power distribution units (PDUs) and uninterruptible power supplies (UPSs) Full air conditioning system (ACUs) and controls within the room The term Virtual Facility was introduced to address the emerging environmental problems facing modern Mission Critical Facilities (MCFs). This concept combines virtual reality (VR), computer simulation, and expert systems applied to the domain of facilities. The VF type of computer simulation allows for detailed analysis and prototyping of airflow in the data center using computational fluid dynamics (CFD) techniques. This enables the visualization and numerical analysis of airflow and temperatures within the facility, helping to predict real-world outcomes. == VF applications == The VF model can be used to assist with the following: Greenfield design Asset management Troubleshooting existing data centers Making existing data centers more resilient Making existing data centers more energy efficient Cost prediction Staff training Capacity planning Load growth management Many organizations use VF models to virtually assess scenarios before committing resources to physical changes. This allows for better decision-making regarding the addition or modification of equipment, helping to avoid logistical or thermal problems.

    Read more →
  • Terminology extraction

    Terminology extraction

    Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is also essential to the language industry. One of the first steps to model a knowledge domain is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases. Noun phrases include compounds (e.g. "credit card"), adjective noun phrases (e.g. "local tourist information office"), and prepositional noun phrases (e.g. "board of directors"). In English, the first two (compounds and adjective noun phrases) are the most frequent. Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology or a terminology base. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc. == Bilingual terminology extraction == The methods for terminology extraction can be applied to parallel corpora. Combined with e.g. co-occurrence statistics, candidates for term translations can be obtained. Bilingual terminology can be extracted also from comparable corpora (corpora containing texts within the same text type, domain but not translations of documents between each other).

    Read more →
  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • Data management plan

    Data management plan

    A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future. DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. == Importance == Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector. In the 2010s, funding agencies increasingly required data management plans as part of the proposal and evaluation process, despite little or no evidence of their efficacy. == Major components == "There is no general and definitive list of topics that should be covered in a DMP for a research project", and researchers are often left to their own devices as to how to fill out a DMP. === Information about data and data format === A description of data to be produced by the project. This might include (but is not limited to) data that are: Experimental Observational Raw or derived Physical collections Models Simulations Curriculum materials Software Images How will the data be acquired? When and where will they be acquired? After collection, how will the data be processed? Include information about Software used Algorithms Scientific workflows File formats that will be used, justify those formats, and describe the naming conventions used. Quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? How will the data be managed in the short-term? Consider the following: Version control for files Backing up data and data products Security & protection of data and data products Who will be responsible for management === Metadata content and format === Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as "data about data". Issues to be considered include: How detailed has the metadata to be in order to make the data meaningful? How will the metadata be created and/or captured? Examples include lab notebooks, GPS hand-held units, Auto-saved files on instruments, etc. What format will be used for the metadata? What are the metadata standards commonly used in the respective scientific discipline? There should be justification for the format chosen. === Policies for access, sharing, and re-use === Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. Address any ethical or privacy issues with data sharing Address intellectual property & copyright issues. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? Describe the intended future uses/users for the data Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a persistent identifier (e.g., ARK, DOI, Handle, PURL, URN) assigned to it? === Long-term storage and data management === Researchers should identify an appropriate archive for the long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. An individual should be identified as the primary contact person for archived data, and ensure contact information is always kept up-to-date in case there are requests for data or information about data. === Budget === Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are Human resources and staff as they handle data preparation, management, documentation, and preservation Hardware and/or software needed for data management, backing up, security, documentation, and preservation Costs associated with submitting the data to an archive The data management plan should include how these costs will be paid. == NSF Data Management Plan == All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following: The types of data The standards to be used for data and metadata format and content Policies for access and sharing Policies and provisions for re-use Plans for archiving data Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): Promptly publish with appropriate authorship Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame Share software and inventions Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others Policies will be implemented via Proposal review Award negotiations and conditions Support/incentives == ESRC Data Management Plan == Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. ESRC has a longstanding arrangement with the UK Data A

    Read more →
  • DONE

    DONE

    The Data-based Online Nonlinear Extremumseeker (DONE) algorithm is a black-box optimization algorithm. DONE models the unknown cost function and attempts to find an optimum of the underlying function. The DONE algorithm is suitable for optimizing costly and noisy functions and does not require derivatives. An advantage of DONE over similar algorithms, such as Bayesian optimization, is that the computational cost per iteration is independent of the number of function evaluations. == Methods == The DONE algorithm was first proposed by Hans Verstraete and Sander Wahls in 2015. The algorithm fits a surrogate model based on random Fourier features and then uses a well-known L-BFGS algorithm to find an optimum of the surrogate model. == Applications == DONE was first demonstrated for maximizing the signal in optical coherence tomography measurements, but has since then been applied to various other applications. For example, it was used to help extending the field of view in light sheet fluorescence microscopy.

    Read more →
  • Pseudonymization

    Pseudonymization

    Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. Pseudonymization (or pseudonymisation, the spelling under European guidelines) is one way to comply with the European Union's General Data Protection Regulation (GDPR) demands for secure data storage of personal information. Pseudonymized data can be restored to its original state with the addition of information which allows individuals to be re-identified. In contrast, anonymization is intended to prevent re-identification of individuals within the dataset. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decisions (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone ... and that this process is irreversible." == Impact of Schrems II ruling == The European Data Protection Supervisor (EDPS) on 9 December 2021 highlighted pseudonymization as the top technical supplementary measure for Schrems II compliance. Less than two weeks later, the EU Commission highlighted pseudonymization as an essential element of the equivalency decision for South Korea, which is the status that was lost by the United States under the Schrems II ruling by the Court of Justice of the European Union (CJEU). The importance of GDPR-compliant pseudonymization increased dramatically in June 2021 when the European Data Protection Board (EDPB) and the European Commission highlighted GDPR-compliant pseudonymization as the state-of-the-art technical supplementary measure for the ongoing lawful use of EU personal data when using third country (i.e., non-EU) cloud processors or remote service providers under the "Schrems II" ruling by the CJEU. Under the GDPR and final EDPB Schrems II Guidance, the term pseudonymization requires a new protected "state" of data, producing a protected outcome that: Protects direct, indirect, and quasi-identifiers, together with characteristics and behaviors; Protects at the record and data set level versus only the field level so that the protection travels wherever the data goes, including when it is in use; and Protects against unauthorized re-identification via the mosaic effect by generating high entropy (uncertainty) levels by dynamically assigning different tokens at different times for various purposes. The combination of these protections is necessary to prevent the re-identification of data subjects without the use of additional information kept separately, as required under GDPR Article 4(5) and as further underscored by paragraph 85(4) of the final EDPB Schrems II guidance: Article 4(5) "Definitions" of the GDPR defines pseudonymization as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person." "Use Case 2: Transfer of pseudonymised Data Paragraph 85(4)" of the final EDPB Schrems II Guidance requires that “the controller has established by means of a thorough analysis of the data in question – taking into account any information that the public authorities of the recipient country may be expected to possess and use – that the pseudonymised personal data cannot be attributed to an identified or identifiable natural person even if cross-referenced with such information." GDPR-compliant pseudonymization requires that data is "anonymous" in the strictest EU sense of the word – globally anonymous – but for the additional information held separately and made available under controlled conditions as authorized by the data controller for permitted re-identification of individual data subjects. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decision (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone, in line with recital 26 of Regulation (EU) 2016/679, and that this process is irreversible." Before the Schrems II ruling, pseudonymization was a technique used by security experts or government officials to hide personally identifiable information to maintain data structure and privacy of information. Some common examples of sensitive information include postal code, location of individuals, names of individuals, race and gender, etc. After the Schrems II ruling, GDPR-compliant pseudonymization must satisfy the above-noted elements as an "outcome" versus merely a technique. == Data fields == The choice of which data fields are to be pseudonymized is partly subjective. Less selective fields, such as birth date or postal code are often also included because they are usually available from other sources and therefore make a record easier to identify. Pseudonymizing these less identifying fields removes most of their analytic value and is therefore normally accompanied by the introduction of new derived and less identifying forms, such as year of birth or a larger postal code region. Data fields that are less identifying, such as date of attendance, are usually not pseudonymized. This is because too much statistical utility is lost in doing so, not because the data cannot be identified. For example, given prior knowledge of a few attendance dates it is easy to identify someone's data in a pseudonymized dataset by selecting only those people with that pattern of dates. This is an example of an inference attack. The weakness of pre-GDPR pseudonymized data to inference attacks is commonly overlooked. A famous example is the AOL search data scandal. The AOL example of unauthorized re-identification did not require access to separately kept "additional information" that was under the control of the data controller as is now required for GDPR-compliant pseudonymization, outlined below under the section "New Definition for Pseudonymization Under GDPR". Protecting statistically useful pseudonymized data from re-identification requires: a sound information security base controlling the risk that the analysts, researchers or other data workers cause a privacy breach The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization, where all person-related data that could allow backtracking has been purged. Pseudonymization is an issue in, for example, patient-related data that has to be passed on securely between clinical centers. The application of pseudonymization to e-health intends to preserve the patient's privacy and data confidentiality. It allows primary use of medical records by authorized health care providers and privacy preserving secondary use by researchers. In the US, HIPAA provides guidelines on how health care data must be handled and data de-identification or pseudonymization is one way to simplify HIPAA compliance. However, plain pseudonymization for privacy preservation often reaches its limits when genetic data are involved (see also genetic privacy). Due to the identifying nature of genetic data, depersonalization is often not sufficient to hide the corresponding person. Potential solutions are the combination of pseudonymization with fragmentation and encryption. An example of application of pseudonymization procedure is creation of datasets for de-identification research by replacing identifying words with words from the same category (e.g. replacing a name with a random name from the names dictionary), however, in this case it is in general not possible to track data back to its origins. == New definition under GDPR == Effective as of May 25, 2018, the EU General Data Protection Regulation (GDPR) defines pseudonymization for the very first time at the EU level in Article 4(5). Under Article 4(5) definitional requirements, data is pseudonymized if it cannot be attributed to a specific data subject without the use of separately kept "additional information". Pseudonymized data embodies the state of the art in Data Protection by Design and by Default because it requires protection of both direct and indirect identifiers (not just direct). GDPR Data Protection by Design and by Default principles as embodied in pseudonymization require protection of both direct and indirect identifiers so that personal data is not cross-referenceable (or re-identifiable) via the "mosaic effect" without access to "additional information" that is kept separately by the controller. Because access to separately kept "additional information" is required

    Read more →
  • Inauthentic text

    Inauthentic text

    An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with Spam blogs. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text. Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press or Flarf poetry. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low. With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky coined the phrase "Colorless green ideas sleep furiously" giving an example of grammatically correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning. The first group to use the expression in this regard can be found below from Indiana University. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.

    Read more →
  • Agentic commerce

    Agentic commerce

    Agentic commerce (also referred to as agent-based commerce) describes an emerging form of e-commerce in which autonomous artificial intelligence (AI) agents independently execute purchasing and payment processes on behalf of users or organizations. Unlike conventional digital commerce systems, which require direct human interaction at key decision points, agentic commerce systems are designed to search for products or services, evaluate options, make purchasing decisions, and complete payments without real-time human involvement. An emerging development within the broader fields of e-commerce, fintech, and artificial intelligence; agentic commerce combines advances in generative AI, autonomous agents, application programming interfaces (APIs), and digital payment infrastructures to direct transactions with no direct human interaction. == Characteristics == A defining feature of agentic commerce is the delegation of end-to-end commercial activities to software agents. These agents typically operate according to predefined user preferences, rules, or constraints, such as price limits, quality criteria, delivery times, or preferred payment methods. Based on these parameters, an agent can autonomously perform tasks including product discovery, price comparison, contract selection, order placement, and payment execution. In contrast to decision-support systems, which provide recommendations to human users, agentic commerce systems are designed to act independently. Human involvement may be limited to initial configuration, periodic supervision, or exception handling. == Comparison with traditional and AI-assisted commerce == Traditional e-commerce requires users to manually browse products, select offers, and authorize payments. Generative AI systems used in commerce commonly assist users by answering questions or suggesting options, and do not complete transactions autonomously. Agentic commerce differs in that decision-making authority is partially or fully transferred to AI agents. As a result, the conventional customer journey, characterized by conscious decision points, may be replaced by continuous, automated micro-decisions performed by software. == Applications and business use cases == Potential applications of agentic commerce include recurring purchases, subscription management, business-to-business procurement, inventory replenishment, and price monitoring. In such contexts, transactions are often predictable and standardized, making them suitable for automation. From a business perspective, agentic commerce systems may be used to optimize supply chains, manage inventory levels, negotiate prices algorithmically, or execute transactions across multiple platforms. Enterprises adopting the new technology include retailers Walmart, Home Depot, Wayfair and Urban Outfitters, and ad tech DSPs, including Google Ads, Amazon, and Yahoo. Chinese tech firms are using apps to provide full-service shopping and payment tools. These includes Alibaba, Tencent, and ByteDance who are currently developing AI powered shopping apps. The Qwen AI chatbot allows users to complete transactions directly within its interface. US firms are still leading in developing AI models but integration is slower due to privacy restrictions. == Payments and technical infrastructure == Agentic commerce relies on digital payment systems capable of supporting automated, machine-initiated transactions, including API-based payment processing, tokenization, real-time authorization, and continuous risk monitoring. Typical user interfaces, such as shopping carts, may be replaced by backend integrations between AI agents, merchants, and payment service providers. For example, Iike 2025, Alibaba launched Alipay AI Pay, which grew and began operating as an application for different retailers. In December 2025, Alipay teamed up with Rokid to enable developers to integrate AI payments into AI agents on Rokid's Lingzhu platform. In January 2025, Alipay unveiled the Agentic Commerce Trust Protocol in partnership with Alibaba's consumer AI applications, such as the Qwen App and Taobao Instant Commerce. Qwen adopted the platform first, connecting it to Taobao Instant Commerce and Alipay AI Pay. Users could use Qwen's agentic feature to place food and drink orders within the application instead of having to click outside to an external browser. For merchants, participation in agentic commerce may require products and services to be presented in structured, machine-readable formats to ensure discoverability and interoperability with autonomous agents. == Universal Commerce Protocol (UCP) == In January 2026, Google announced the Universal Commerce Protocol (UCP), an open-source web standard intended to enable interoperability between AI agents and retail systems across the shopping journey, from discovery and checkout to post-purchase support. UCP makes use of REST, JSON-RPC transports, and support for Agent Payments Protocol (AP2), Agent2Agent (A2A), and Model Context Protocol (MCP). == Legal, regulatory, and security considerations == The use of autonomous agents in commerce raises legal and regulatory questions, particularly regarding authorization, liability, consumer protection, and fraud prevention. Existing payment and contract frameworks are generally based on human decision-makers, and their applicability to autonomous agents remains an area of active discussion. Open issues include responsibility for unauthorized or erroneous transactions, mechanisms for dispute resolution, standards for agent authentication, and compliance with data protection and financial regulations. Continuous, automated transaction patterns may also require new approaches to security and risk assessment. Traditional fraud models centered on identity verification may be insufficient for agentic commerce, and that merchants may need intent-based detection methods using machine learning and behavioral analysis to distinguish legitimate AI agents from malicious automation. === Governance frameworks === The deployment of autonomous AI agents in commercial environments has prompted the development of dedicated governance frameworks. These aim to define operational boundaries, decision authority, oversight mechanisms, and accountability structures for agentic systems. The Agentic Commerce Framework (ACF), created in 2025 by Vincent Dorange, is a governance standard that structures the deployment of autonomous AI agents around four founding principles (Decision Sovereignty, Governance by Design, Ultimate Human Control, Traceable Accountability), four operational layers, and 18 governance KPIs. In January 2026, Singapore's Infocomm Media Development Authority (IMDA) published the Model AI Governance Framework for Agentic AI, extending its existing AI governance guidelines to address agent-specific risks including delegation chains and multi-agent coordination. The Cloud Security Alliance (CSA) has also proposed an Agentic Trust Framework applying zero-trust principles to AI agent governance. == Ecosystem and implementation == The adoption of agentic commerce typically requires changes in commerce architecture, data modeling, identity and permissions, and API-based orchestration of checkout and post-purchase workflows. Management consultancies have identified agentic commerce as a structural evolution of digital commerce, emphasizing the role of AI-driven agents in automating discovery, decision-making, and transaction processes across commerce systems. McKinsey & Company has described agentic commerce as a significant shift in how consumers interact with brands and how enterprises design their commerce operating models. In Europe, this ecosystem also includes digital commerce consultancies specializing in the adoption of agentic commerce. Consulting firms such as Horrea support brands in understanding and implementing the technological and organizational shifts associated with agentic commerce. == Market development and outlook == Agentic commerce is generally regarded as an early-stage development. Industry analysts have projected that AI-driven agents could account for a small but growing share of digital payment transactions within the coming years. Due to the scale of global digital commerce, even limited adoption could represent substantial transaction volumes. Analysts expect that by 2029, AI agents could handle between 1% and 4% of all digital payment transactions. With a projected total transaction volume of over $36 trillion a year, even a small share translates into a market worth up to $1.47 trillion. According to a McKinsey study from October 2025, agentic commerce projects that by 2030, the U.S. business-to-consumer retail market alone could see up to $1 trillion in revenue orchestrated through agentic commerce. On a global scale, the opportunity could range from $3 trillion to $5 trillion. Early experiments and pilot projects have demonstrated both the potential and current limitations of the

    Read more →
  • Communication-avoiding algorithm

    Communication-avoiding algorithm

    Communication-avoiding algorithms minimize movement of data within a memory hierarchy for improving its running-time and energy consumption. These minimize the total of two costs (in terms of time and energy): arithmetic and communication. Communication, in this context refers to moving data, either between levels of memory or between multiple processors over a network. It is much more expensive than arithmetic. == Formal theory == === Two-level memory model === A common computational model in analyzing communication-avoiding algorithms is the two-level memory model: There is one processor and two levels of memory. Level 1 memory is infinitely large. Level 0 memory ("cache") has size M {\displaystyle M} . In the beginning, input resides in level 1. In the end, the output resides in level 1. Processor can only operate on data in cache. The goal is to minimize data transfers between the two levels of memory. === Matrix multiplication === Corollary 6.2: More general results for other numerical linear algebra operations can be found in. The following proof is from. == Motivation == Consider the following running-time model: Measure of computation = Time per FLOP = γ Measure of communication = No. of words of data moved = β ⇒ Total running time = γ·(no. of FLOPs) + β·(no. of words) From the fact that β >> γ as measured in time and energy, communication cost dominates computation cost. Technological trends indicate that the relative cost of communication is increasing on a variety of platforms, from cloud computing to supercomputers to mobile devices. The report also predicts that gap between DRAM access time and FLOPs will increase 100× over coming decade to balance power usage between processors and DRAM. Energy consumption increases by orders of magnitude as we go higher in the memory hierarchy. United States president Barack Obama cited communication-avoiding algorithms in the FY 2012 Department of Energy budget request to Congress: New Algorithm Improves Performance and Accuracy on Extreme-Scale Computing Systems. On modern computer architectures, communication between processors takes longer than the performance of a floating-point arithmetic operation by a given processor. ASCR researchers have developed a new method, derived from commonly used linear algebra methods, to minimize communications between processors and the memory hierarchy, by reformulating the communication patterns specified within the algorithm. This method has been implemented in the TRILINOS framework, a highly-regarded suite of software, which provides functionality for researchers around the world to solve large scale, complex multi-physics problems. == Objectives == Communication-avoiding algorithms are designed with the following objectives: Reorganize algorithms to reduce communication across all memory hierarchies. Attain the lower-bound on communication when possible. The following simple example demonstrates how these are achieved. === Matrix multiplication example === Let A, B and C be square matrices of order n × n. The following naive algorithm implements C = C + A B: for i = 1 to n for j = 1 to n for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) Arithmetic cost (time-complexity): n2(2n − 1) for sufficiently large n or O(n3). Rewriting this algorithm with communication cost labelled at each step for i = 1 to n {read row i of A into fast memory} - n2 reads for j = 1 to n {read C(i,j) into fast memory} - n2 reads {read column j of B into fast memory} - n3 reads for k = 1 to n C(i,j) = C(i,j) + A(i,k) B(k,j) {write C(i,j) back to slow memory} - n2 writes Fast memory may be defined as the local processor memory (CPU cache) of size M and slow memory may be defined as the DRAM. Communication cost (reads/writes): n3 + 3n2 or O(n3) Since total running time = γ·O(n3) + β·O(n3) and β >> γ the communication cost is dominant. The blocked (tiled) matrix multiplication algorithm reduces this dominant term: ==== Blocked (tiled) matrix multiplication ==== Consider A, B and C to be n/b-by-n/b matrices of b-by-b sub-blocks where b is called the block size; assume three b-by-b blocks fit in fast memory. for i = 1 to n/b for j = 1 to n/b {read block C(i,j) into fast memory} - b2 × (n/b)2 = n2 reads for k = 1 to n/b {read block A(i,k) into fast memory} - b2 × (n/b)3 = n3/b reads {read block B(k,j) into fast memory} - b2 × (n/b)3 = n3/b reads C(i,j) = C(i,j) + A(i,k) B(k,j) - {do a matrix multiply on blocks} {write block C(i,j) back to slow memory} - b2 × (n/b)2 = n2 writes Communication cost: 2n3/b + 2n2 reads/writes << 2n3 arithmetic cost Making b as large possible: 3b2 ≤ M we achieve the following communication lower bound: 31/2n3/M1/2 + 2n2 or Ω (no. of FLOPs / M1/2) == Previous approaches for reducing communication == Most of the approaches investigated in the past to address this problem rely on scheduling or tuning techniques that aim at overlapping communication with computation. However, this approach can lead to an improvement of at most a factor of two. Ghosting is a different technique for reducing communication, in which a processor stores and computes redundantly data from neighboring processors for future computations. Cache-oblivious algorithms represent a different approach introduced in 1999 for fast Fourier transforms, and then extended to graph algorithms, dynamic programming, etc. They were also applied to several operations in linear algebra as dense LU and QR factorizations. The design of architecture specific algorithms is another approach that can be used for reducing the communication in parallel algorithms, and there are many examples in the literature of algorithms that are adapted to a given communication topology.

    Read more →