Plant Nanny is a water tracker mobile application which reminds users to drink water. It was developed by Taiwanese app maker Fourdesire. The app was first released in 2013 and is available on the Apple App Store for iPhones and the Google Play Store for Android devices. == Description == Play Nanny uses a game method that allows users to turn their virtual selves into plants, which grows and thrives as the user drinks more water. The app sends occasional push notifications to remind users to drink water throughout the day. Users can choose from a wide range of plants, including cacti and carnations, and track their water intake. The app uses two resources, How to calculate how much water you should drink by Jennifer Stone (2018) and Human energy requirements by the Food and Agriculture Organization (2004), to calculate the recommended daily water intake for its users. Upon downloading the app, users are prompted to input basic personal information which is then used to calculate the recommended daily water intake and prompts them to drink the appropriate amount. == Accolades ==
Touch 'n Go eWallet
Touch 'n Go eWallet is a Malaysian digital wallet and online payment platform, established in Kuala Lumpur, Malaysia, in July 2017 as a joint venture between Touch 'n Go and Ant Financial. It allows users to make payments at over 280,000 merchant touch points via QR code, as well as perform peer-to-peer (P2P) money transfers. Since then, the e-wallet further diversified for users to pay for tolls via RFID or PayDirect, street parking and various online payment spanning e-hailing, car-sharing apps or taxis, various overhead bills; top-up for mobile prepaid or in-game currencies; purchases on e-commerce websites; food delivery; renewing motor insurance and other insurance/takaful plans; and even movie, bus, trains or airline tickets. == Background == Prior to the launch of the e-wallet service, Touch 'n Go provided stored-value physical all-in-one contactless card (namely Touch 'n Go cards or "TnG cards") that users can use to pay for toll fares, public transportation and parking lots as well as purchases in some retail stores. In 1999, Touch 'n Go also markets SmartTag devices that allow road users to pass through certain toll booths without the need to unwind the car window. The high entry cost of the device (around RM 100 each) also meant that only few can enjoy the seamless experience. In 2009, Touch 'n Go partnered with Maxis to launch FastTap, a new mobile payment service that utilised Near-Field Communication (NFC). Maxis customers can make payments by placing the phone near the card readers (that also supports physical bank cards and Touch ’N Go cards). However, the venture featured only one phone model, Nokia 6212, which greatly limited the public reach. In July 2012, Touch 'n Go announced another collaboration with CIMB and Maxis to create similar NFC-based online transaction service that runs on compatible smartphones. Touch 'n Go Wallet was launched in February 2017 as an QR code-based e-wallet application, to compete with Samsung Pay that utilizes NFC modules. In the controlled pilot test in Taman Tun Dr Ismail, the correspondents can experience basic functionalities (prepaid mobile service reload, bills payment, movie tickets and flight tickets purchase, transfer of money with another user, and payments at participating stores and restaurants). While the deployed version of the app was generally well-received, the existing process to transfer the balance to the physical TnG card stored value from the app garnered unanimous backlash. Test groups felt that the need to head to a self-service terminal named "Pick Up Device" in person within 24 hours for completion, along with the failure to do so (the balance would be credited back to the wallet after 24 hours), was not divulged clearly and also defeated the purpose of convenience, not to mention there were only 2 such terminals. The feature was eventually suspended. On 15 November 2017, Touch 'n Go was granted permission by the Central Bank of Malaysia to form a joint venture with Ant Financial, a Chinese-based financial company that operates Alipay. The partnership allowed the local e-wallet to learn from and build upon the operational model pioneered by Alipay. In June 2018, it was reported that Touch 'n Go was pilot testing the uses of the Touch 'n Go eWallet in Rapid Transit, as the ticketing system was enabled on the Kelana Jaya line in the Klang Valley. Pilot testing only applied to stations in Kelana Jaya, KL Gateway–Universiti, Kerinchi, KL Sentral, Dang Wangi, KLCC, and Ampang Park. The test was reported to be successful in February 2020 and was planned to be fully deployed on the LRT and MRT. Due to unforeseen circumstances, this feature did not come into fruition, the app merely adds in-app purchase of monthly concession cards called "My50". In August 2018, Touch 'n Go announced that selected drivers may experience first-hand a new RFID-based payment (later rebranded as "myRFID") that serves to replace SmartTag devices on closed toll roads with during pilot testing phase commencing on 3 September 2018. On 2 November 2018, participation in the ongoing pilot programme was expanded, allowing more drivers to sign up ahead of the public rollout of the RFID system. During the same period, Touch 'n Go has discontinued the sales of SmartTAG devices in favor of the RFID-based payment system. Initially, the installation of the RFID chip onto the car could only be done by Touch 'n Go staff at the RFID fitment centers, at no cost. As the pilot testing concluded on 15 February 2020, a self-installation kit are being offered to the public on Lazada and Shopee. Support for taxi-hailing mobile apps was added in November 2018 when Touch 'n Go partnered with EzCab and Public Cab, allowing users to make payments via QR code. This was later expanded to support MULA on 7 January 2020, and later MyCar on 4 April 2020. Touch 'n Go eWallet was also the first eWallet to convert Kuala Lumpur's most famous Ramadan bazaar in Kampong Bahru into "Kampong Kashless", a venue that can accept cashless QR payments. It welcomed more than 250,000 Malaysians including local celebrities and government officials. On 1 October 2019, some e-commerce websites owned by the Alibaba Group (TMall and Taobao) began to support Touch 'n Go eWallet payments, Lazada joined the list on 29 October 2019. Touch 'n Go eWallet was one of the three e-wallet services in Malaysia (the other being Boost and GrabPay) that was eligible for its users to receive an RM 30 credit in conjunction of E-Tunai Rakyat program under the Budget 2020 plan, that further normalizes adoption of cashless and mobile payment among Malaysians. Unlike Boost and GrabPay, whose P2P transfers were completely disabled until users have exhausted the RM 30 first, Touch 'n Go eWallet did not impose such measures. in 2020, Touch 'n Go eWallet joined DuitNow, an electronic transaction ecosystem in Malaysia which allows the funds from Touch 'n Go eWallet to be transferred to other competing services and vice versa, by implementing a standard DuitNow QR code deisgn. Japan become the first country outside Malaysia to support Touch 'n Go eWallet payment via Alipay Connect. During the COVID-19 pandemic and the enforcement of the movement control order, use of eWallets (including Touch 'n Go eWallet) increased tremendously among citizens due to its contactless nature of the payment and increased take-out orders at home; which in turn helped small and medium-sized enterprises to thrive. Touch 'n Go eWallet launched its loyalty programme – The Goal Hunter – in October 2020 where on monthly basis, users collect stamps by paying with the app in exchange for rewards that include lucky draws and other vouchers. == Services == Touch 'n Go eWallet app is available for download on both Google Play and Apple Appstore. It utilizes QR code technology for local in-store payments. The Touch 'n Go eWallet app also diversifies payment types, including but not limited to Utility bills Purchase of motor insurance policy Pay Later facility Prepaid reload and Postpaid payment to telecommunications companies loan repayments for courts, MBSJ payments, zakat and PTPTN payment for car parking P2P transfer airline ticket bookings; movie tickets from TGV Cinemas RFID refuelling at Shell stations (defunct after Shell launched its own payment app in 2024) User can reload the eWallet credit by setting up auto-reload, purchasing reload pins from convenience stores (such as 7-Eleven, KK Super Mart, MyNews, Family Mart etc.), reloading by FPX and credit/debit card. The PayDirect feature allows users to link their physical Touch 'n Go cards into the eWallet, where the toll fare can be debited from the eWallet balance when flashing the card near the sensor. In the circumstance of insufficient balance in the app, the toll fare will be deducted from the physical card's balance instead. This also conveniently allows users to view the card's remaining balance. Touch 'n Go eWallet is the first and only eWallet to offer a money-back guarantee when an unauthorised transaction is made on the user’s eWallet account, subject to Terms & Conditions. Payment via QR code scanning, including Touch 'n Go eWallet, becomes a norm in most of the shops/restaurants across Malaysia, including roadside hawkers/stall owners and automatic vending machines. The merchants usually display their owner's individual QR or Business account that they can apply for in-app. The popularity attributes to the low merchant onboarding cost (Unlike NFC payment and debit/credit card that requires purchase or rental of a payment terminal device at a yearly fee.) The app is also one of the few ewallet that supports bidirectional liquidity (alongside MAE developed by Maybank), where funds can be transferred two-way with bank accounts. This is not possible with the other major ewallets (GrabPay, Boost, ShopeePay etc.) where the money that is reloaded to the wallet cannot be transferred to another bank account, unless through manual req
Random projection
In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing. == Dimensionality reduction == Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc. Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset. == Method == The core idea behind random projection is given in the Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability. In random projection, the original d {\displaystyle d} -dimensional data is projected to a k {\displaystyle k} -dimensional subspace, by multiplying on the left by a random matrix R ∈ R k × d {\displaystyle R\in \mathbb {R} ^{k\times d}} . Using matrix notation: If X d × N {\displaystyle X_{d\times N}} is the original set of N d-dimensional observations, then X k × N R P = R k × d X d × N {\displaystyle X_{k\times N}^{RP}=R_{k\times d}X_{d\times N}} is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the d × N {\displaystyle d\times N} data matrix X onto K dimensions of order O ( d k N ) {\displaystyle O(dkN)} . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order O ( c k N ) {\displaystyle O(ckN)} . === Orthogonal random projection === A unit vector can be orthogonally projected to a random subspace. Let u {\displaystyle u} be the original unit vector, and let v {\displaystyle v} be its projection. The norm-squared ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first k {\displaystyle k} coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution x ∼ N ( 0 , I d × d ) {\displaystyle x\sim {\mathcal {N}}(0,I_{d\times d})} , then normalizing it. Therefore, ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as ∑ i = 1 k x i 2 ∑ i = 1 k x i 2 + ∑ i = k + 1 d x i 2 {\displaystyle {\frac {\sum _{i=1}^{k}x_{i}^{2}}{\sum _{i=1}^{k}x_{i}^{2}+\sum _{i=k+1}^{d}x_{i}^{2}}}} , which by the chi-squared construction of the Beta distribution, has distribution Beta ( k / 2 , ( d − k ) / 2 ) {\displaystyle \operatorname {Beta} (k/2,(d-k)/2)} , with mean k / d {\displaystyle k/d} . We have a concentration inequality P r [ | ‖ v ‖ 2 − k d | ≥ ϵ k d ] ≤ 3 exp ( − k ϵ 2 / 64 ) {\displaystyle Pr\left[\left|\|v\|_{2}-{\frac {k}{d}}\right|\geq \epsilon {\sqrt {\frac {k}{d}}}\right]\leq 3\exp \left(-k\epsilon ^{2}/64\right)} for any ϵ ∈ ( 0 , 1 ) {\displaystyle \epsilon \in (0,1)} . === Gaussian random projection === The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from S d − 1 {\displaystyle S^{d-1}} . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied: Spherical symmetry: For any orthogonal matrix A ∈ O ( d ) {\displaystyle A\in O(d)} , RA and R have the same distribution. Orthogonality: The rows of R are orthogonal to each other. Normality: The rows of R are unit-length vectors. === More computationally efficient random projections === Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to R i , j = 3 / k × { + 1 with probability 1 6 0 with probability 2 3 − 1 with probability 1 6 {\displaystyle R_{i,j}={\sqrt {3/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{6}}\\0&{\text{with probability }}{\frac {2}{3}}\\-1&{\text{with probability }}{\frac {1}{6}}\end{cases}}} or the full matrix can be sampled IID according to R i , j = 1 / k × { + 1 with probability 1 2 − 1 with probability 1 2 {\displaystyle R_{i,j}={\sqrt {1/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{2}}\\-1&{\text{with probability }}{\frac {1}{2}}\end{cases}}} Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in. It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster. === Random Projection with Quantization === Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods. == Large quasiorthogonal bases == The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space. There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data. Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one. This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space. == Implementations == RandPro - An R package for random projection sklearn.random_projection - A module for random projection from the scikit-learn Python library Weka implementation [1]
Multidimensional analysis
In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set. == Higher dimensions == In many disciplines, two-dimensional data sets are also called panel data. While, strictly speaking, two- and higher-dimensional data sets are "multi-dimensional", the term "multidimensional" tends to be applied only to data sets with three or more dimensions. For example, some forecast data sets provide forecasts for multiple target periods, conducted by multiple forecasters, and made at multiple horizons. The three dimensions provide more information than can be gleaned from two-dimensional panel data sets. == Software == Computer software for MDA include Online analytical processing (OLAP) for data in relational databases, pivot tables for data in spreadsheets, and Array DBMSs for general multi-dimensional data (such as raster data) in science, engineering, and business.
Mutation (evolutionary algorithm)
Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of an evolutionary algorithm (EA), including genetic algorithms in particular. It is analogous to biological mutation. The classic example of a mutation operator of a binary coded genetic algorithm (GA) involves a probability that an arbitrary bit in a genetic sequence will be flipped from its original state. A common method of implementing the mutation operator involves generating a random variable for each bit in a sequence. This random variable tells whether or not a particular bit will be flipped. This mutation procedure, based on the biological point mutation, is called single point mutation. Other types of mutation operators are commonly used for representations other than binary, such as floating-point encodings or representations for combinatorial problems. The purpose of mutation in EAs is to introduce diversity into the sampled population. Mutation operators are used in an attempt to avoid local minima by preventing the population of chromosomes from becoming too similar to each other, thus slowing or even stopping convergence to the global optimum. This reasoning also leads most EAs to avoid only taking the fittest of the population in generating the next generation, but rather selecting a random (or semi-random) set with a weighting toward those that are fitter. The following requirements apply to all mutation operators used in an EA: every point in the search space must be reachable by one or more mutations. there must be no preference for parts or directions in the search space (no drift). small mutations should be more probable than large ones. For different genome types, different mutation types are suitable. Some mutations are Gaussian, Uniform, Zigzag, Scramble, Insertion, Inversion, Swap, and so on. An overview and more operators than those presented below can be found in the introductory book by Eiben and Smith or in. == Bit string mutation == The mutation of bit strings ensue through bit flips at random positions. Example: The probability of a mutation of a bit is 1 l {\displaystyle {\frac {1}{l}}} , where l {\displaystyle l} is the length of the binary vector. Thus, a mutation rate of 1 {\displaystyle 1} per mutation and individual selected for mutation is reached. == Mutation of real numbers == Many EAs, such as the evolution strategy or the real-coded genetic algorithms, work with real numbers instead of bit strings. This is due to the good experiences that have been made with this type of coding. The value of a real-valued gene can either be changed or redetermined. A mutation that implements the latter should only ever be used in conjunction with the value-changing mutations and then only with comparatively low probability, as it can lead to large changes. In practical applications, the respective value range of the decision variables to be changed of the optimisation problem to be solved is usually limited. Accordingly, the values of the associated genes are each restricted to an interval [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} . Mutations may or may not take these restrictions into account. In the latter case, suitable post-treatment is then required as described below. === Mutation without consideration of restrictions === A real number x {\displaystyle x} can be mutated using normal distribution N ( 0 , σ ) {\displaystyle {\mathcal {N}}(0,\sigma )} by adding the generated random value to the old value of the gene, resulting in the mutated value x ′ {\displaystyle x'} : x ′ = x + N ( 0 , σ ) {\displaystyle x'=x+{\mathcal {N}}(0,\sigma )} In the case of genes with a restricted range of values, it is a good idea to choose the step size of the mutation σ {\displaystyle \sigma } so that it reasonably fits the range [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} of the gene to be changed, e.g.: σ = x max − x min 6 {\displaystyle \sigma ={\frac {x_{\text{max}}-x_{\text{min}}}{6}}} The step size can also be adjusted to the smaller permissible change range depending on the current value. In any case, however, it is likely that the new value x ′ {\displaystyle x'} of the gene will be outside the permissible range of values. Such a case must be considered a lethal mutation, since the obvious repair by using the respective violated limit as the new value of the gene would lead to a drift. This is because the limit value would then be selected with the entire probability of the values beyond the limit of the value range. The evolution strategy works with real numbers and mutation based on normal distribution. The step sizes are part of the chromosome and are subject to evolution together with the actual decision variables. === Mutation with consideration of restrictions === One possible form of changing the value of a gene while taking its value range [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} into account is the mutation relative parameter change of the evolutionary algorithm GLEAM (General Learning Evolutionary Algorithm and Method), in which, as with the mutation presented earlier, small changes are more likely than large ones. First, an equally distributed decision is made as to whether the current value x {\displaystyle x} should be increased or decreased and then the corresponding total change interval is determined. Without loss of generality, an increase is assumed for the explanation and the total change interval is then [ x , x max ] {\displaystyle [x,x_{\max }]} . It is divided into k {\displaystyle k} sub-areas of equal size with the width δ {\displaystyle \delta } , from which k {\displaystyle k} sub-change intervals of different size are formed: i {\displaystyle i} -th sub-change interval: [ x , x + δ ⋅ i ] {\displaystyle [x,x+\delta \cdot i]} with δ = ( x max − x ) k {\displaystyle \delta ={\frac {(x_{\text{max}}-x)}{k}}} and i = 1 , … , k {\displaystyle i=1,\dots ,k} Subsequently, one of the k {\displaystyle k} sub-change intervals is selected in equal distribution and a random number, also equally distributed, is drawn from it as the new value x ′ {\displaystyle x'} of the gene. The resulting summed probabilities of the sub-change intervals result in the probability distribution of the k {\displaystyle k} sub-areas shown in the adjacent figure for the exemplary case of k = 10 {\displaystyle k=10} . This is not a normal distribution as before, but this distribution also clearly favours small changes over larger ones. This mutation for larger values of k {\displaystyle k} , such as 10, is less well suited for tasks where the optimum lies on one of the value range boundaries. This can be remedied by significantly reducing k {\displaystyle k} when a gene value approaches its limits very closely. === Common properties === For both mutation operators for real-valued numbers, the probability of an increase and decrease is independent of the current value and is 50% in each case. In addition, small changes are considerably more likely than large ones. For mixed-integer optimization problems, rounding is usually used. == Mutation of permutations == Mutations of permutations are specially designed for genomes that are themselves permutations of a set. These are often used to solve combinatorial tasks. In the two mutations presented, parts of the genome are rotated or inverted. === Rotation to the right === The presentation of the procedure is illustrated by an example on the right: === Inversion === The presentation of the procedure is illustrated by an example on the right: === Variants with preference for smaller changes === The requirement raised at the beginning for mutations, according to which small changes should be more probable than large ones, is only inadequately fulfilled by the two permutation mutations presented, since the lengths of the partial lists and the number of shift positions are determined in an equally distributed manner. However, the longer the partial list and the shift, the greater the change in gene order. This can be remedied by the following modifications. The end index j {\displaystyle j} of the partial lists is determined as the distance d {\displaystyle d} to the start index i {\displaystyle i} : j = ( i + d ) mod | P 0 | {\displaystyle j=(i+d){\bmod {\left|P_{0}\right|}}} where d {\displaystyle d} is determined randomly according to one of the two procedures for the mutation of real numbers from the interval [ 0 , | P 0 | − 1 ] {\displaystyle \left[0,\left|P_{0}\right|-1\right]} and rounded. For the rotation, k {\displaystyle k} is determined similarly to the distance d {\displaystyle d} , but the value 0 {\displaystyle 0} is forbidden. For the inversion, note that i ≠ j {\displaystyle i\neq j} must hold, so for d {\displaystyle d} the value 0 {\displaystyle 0} must be excluded.
Test data management
Test data management (TDM) is a process in software testing concerned with the creation, preparation, and control of data used for testing software systems. It involves supplying datasets required to execute test cases and verifying system behaviour under defined conditions. Test data management is an integral part of the software development lifecycle (SDLC) and is utilized in both manual and automated testing processes. It is applied in environments that use continuous integration and DevOps practices, where test execution requires consistent and repeatable data conditions. == Overview == Test data management includes the generation, selection, and preparation of data for testing purposes, as well as its distribution across test environments. It also involves controlling data versions and ensuring that datasets correspond to specific test scenarios. In many cases, production data is adapted for testing through techniques such as masking or subsetting to reduce size and remove sensitive content. Test data management ensures that test cases are executed with relevant, consistent, and readily available data. This reduces variability in test results and supports reproducibility across test cycles. == Importance == The role of test data management has expanded with the growth of complex, data-driven systems and regulatory requirements governing data usage. Testing often depends on data that reflects real-world conditions, but direct use of production data may introduce security and privacy risks. As a result, organizations apply methods such as data masking and anonymization to meet compliance requirements, including those set by the California Privacy Rights Act (CPRA) and Europe’s General Data Protection Regulation (GDPR). Inadequate control of test data can lead to incomplete test coverage, unreliable test results, or delays in testing processes due to unavailable or inconsistent datasets. == Techniques and tools == Test data management leverages various techniques for preparing and controlling data used in testing. These include the generation of synthetic data, the extraction of subsets from production datasets, and the modification of data to remove or obscure sensitive information. A key technical requirement in these processes is maintaining referential integrity, or ensuring that relationships between data entities remain consistent across different tables and systems after masking or subsetting. Data virtualization is also used to provide access to datasets without full replication. These methods may be implemented using software tools that automate data preparation, masking, and distribution.
Computational learning theory
In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms. == Overview == Theoretical results in machine learning often focus on a type of inductive learning known as supervised learning. In supervised learning, an algorithm is provided with labeled samples. For instance, the samples might be descriptions of mushrooms, with labels indicating whether they are edible or not. The algorithm uses these labeled samples to create a classifier. This classifier assigns labels to new samples, including those it has not previously encountered. The goal of the supervised learning algorithm is to optimize performance metrics, such as minimizing errors on new samples. In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning . In computational learning theory, a computation is considered feasible if it can be done in polynomial time . There are two kinds of time complexity results: Positive results – Showing that a certain class of functions is learnable in polynomial time. Negative results – Showing that certain classes cannot be learned in polynomial time. Negative results often rely on commonly believed, but yet unproven assumptions, such as: Computational complexity – P ≠ NP (the P versus NP problem); Cryptographic – One-way functions exist. There are several different approaches to computational learning theory based on making different assumptions about the inference principles used to generalise from limited data. This includes different definitions of probability (see frequency probability, Bayesian probability) and different assumptions on the generation of samples. The different approaches include: Exact learning, proposed by Dana Angluin; Probably approximately correct learning (PAC learning), proposed by Leslie Valiant; VC theory, proposed by Vladimir Vapnik and Alexey Chervonenkis; Inductive inference as developed by Ray Solomonoff; Algorithmic learning theory, from the work of E. Mark Gold; Online machine learning, from the work of Nick Littlestone. While its primary goal is to understand learning abstractly, computational learning theory has led to the development of practical algorithms. For example, PAC theory inspired boosting, VC theory led to support vector machines, and Bayesian inference led to belief networks.