Recent Entries 10
- pattern minor 112d agoGiven $k$ points in $n$-dimensions, such that $n\geq3$, is there a polytime algorithm for finding a curve that splits them into 2 sets of points?So in this math exchange question I asked, it was proven that for $n>2$ dimensions, you can always find a curve that separates $k$ points in $n$-dimensional space into $2$ arbitrary sets that you pre-defined. What I wanted to know is if there is a polytime algorithm (polynomial in $k$) for determining this curve. If so, is there a known way to transform this curve into a function using some form of linear or non-linear transformation?
- principle minor 112d agoWhat is a good approach to symbol identification/recognition given a path, instead of raster dataExcuse any mistakes in my description as I'm new to ML. But I have an application that takes user input to generate paths/curves (All symbols are single paths) and I would then like to attempt identification. This seems, generally, to be a pretty well studied problem, and there is a lot of reference I can find. However everything I've found so far starts with some sort of raster format, which makes sense given that often recognition comes in initially in that format. But given my particular set of constraints it seems that there may be additional useful data given that I have paths. So my question is are there any good techniques for doing identification without first rasterizing my paths that may be well suited for my particular instance, or is rasterizing to a grid particularly well suited to the problem, and should I just raster and solve this problem more classically?
- pattern moderate 112d agoShowing that Bayes classifier is optimalConsider domain $X$, label set $ Y=\{0,1\}$ and the zero-one loss. Given any probability distribution D over $ X\times \{0,1\} $, we've defined the Bayes classifier $ f_D $ to be- $$ f_{D}(x)= \begin{cases} 1 & \text{if }\mathbb{P}[y=1|x]\geq\tfrac{1}{2}\\ 0 & \text{otherwise.} \end{cases} $$ I wish to prove that, for any classifer $ g\colon X\rightarrow\{0,1\}$, $ L_D(f_D)\leq L_D(g)$, which means that $ f_D$ is optimal. $L_D(h) $ is defined to be the "true error" of the classifier $h$. That is, $L_D(h)=D\{(x,y)\mid h(x)\not = y\}$. I'm having some hard time proving this given the definitions above, and some hints/intuition will be appreciated.
- pattern minor 112d agoLanguage Classification + AWS ML: what am I doing wrong?I'm evaluating Amazon's machine learning platform, and thought that I would give it a "simple" classification problem first. As a disclaimer, I am quite new to machine learning (hence my interest in an ML platform). The classification problem is language detection. Given a list of 20k words, and their language (`English, French, or Random`), train a model to classify new words. My data is structured in CSV format, with 2 rows: ``` dàagzj, random tunisia, english craindre, french voters, english religions, english condition, french ... ``` I imported the data successfully into the platform, and all seems fine. When I attempt to run train a model (using both the default settings and tweaking them) I get the same result. English is selected as the language nearly 100% of the time. I know this problem is possible to get reasonably accurate results with simple neural networks, however I'm not sure what is going wrong? Do I need to perform any preprocessing operations on the text input, or is the plain string sufficient? What data can be collected about a single word that may be a more effective input to a machine learning model?
- pattern minor 112d agomachine learning classification with financial instrument/time series dataI am new to machine learning and have started brainstorming some model ideas that involve financial instrument/time series data. I was thinking it might be useful to use a classification algorithm to predict if an instrument was in fact up or down y% (TRUE/FALSE) after n days, based on i.e. a combination of technical indicator states for each learning example. That said, in researching the idea I came upon an article that stated training examples in time series data are not independent of each other: "Time series data has a natural temporal ordering - this differs from typical data mining/machine learning applications where each data point is an independent example of the concept to be learned, and the ordering of data points within a data set does not matter" My question is as follows: is the above true only if we are trying to predict the continuous value of an asset n days into the future? As far as my idea outlined in the first paragraph is concerned, would this then still be valid considering I am not (apparently) taking into account the specific relationship
- pattern minor 112d agoWhat method of collective recogintion to use for digits recognition?The structure of the question is as follows: at first, I provide the concept of collective recognition, further I provide explanation of the various methods of group classification that I found, in the end I introduce you the question. Those who are experts in this field and may not need explanations might just look at the headlines go straight to the question. What is a collective recognition/classification What is meant by the term collective recognition is the task of using multiple classifiers (committee, ensemble, etc.), each of which will decide on the class of one entity with the subsequent coordination of their decisions with the help of a certain algorithm. Using a set of classifiers, typically lead to higher recognition accuracy and better computational efficiency indicators. Some approaches of multiple classifiers decisions integration: - based on the concept of classifiers’ competence areas and procedures that assess the competence of classifiers with respect to each input of the classification system. - methods for combining classifiers decisions based on the use of neural networks. Competence areas method The idea of collective classification based on the competence areas is that each base classifier can work well in some feature space area (area of competence), excelling in this area remaining classifiers in terms of accuracy and reliability of decisions. The area of competence of each base classifier must somehow be estimated. Appropriate program called referee. Classification task is solved so that each algorithm is only used in its own competence area, i.e. where it produces the best results compared to other classifiers. At the same time in each area the decision of only one classifier is taken into account. However, you must have certain algorithm that for any input determines which of the classifiers is the most competent. So, one approach suggests that with each classifier a special algorithm (the referee, which is designed to assess th
- pattern minor 112d agoClassification training data, but regression predictionSuppose I'm performing machine learning on a simple dataset, and have a bunch of training data of the form: ``` x (feature) y (label) ----------------------- 1 0 2 1 3 1 4 0 5 1 6 1 ... ``` Where the labels are values in two classes, $[0, 1]$. Clearly, this training data lends one to believe that it will be a classification task. However, suppose I want to output instead the probability that a feature will take the class $1$. Then, my output is more of a regression task. Consequently, when I'm designing a simple neural network with just a single input layer and single output layer, how many output units should I have? Should I have two output units, one for each class, and if so, how do I ensure that each pair of outputs will be a valid probability distribution (i.e. sum to one)? Or should I have only one output unit, and treat the entire problem as a regression task? There are probably pros/cons to each approach... thanks for your help!
- pattern minor 112d agoWhat is a visual bag of words and how is it implemented?I'm currently working on implementing a bag of visual words in Python. I get the general gist of how it works but I can't seem to find any sources that explain it in more detail to a level where I can implement it. I'm guessing scikit learn and scikit image would come in but I can't seem to point myself in the right direction. Any help?
- pattern minor 112d agoSimple Bayesian classification with Laplace smoothing questionI'm having a hard time getting my head around smoothing, so I've got a very simple question about Laplace/Add-one smoothing based on a toy problem I've been working with. The problem is a simple Bayesian classifier for periods ending sentences vs. periods not ending sentences, based on the word immediately before the period. I'm collecting the following counts in training: number of periods, number of sentence-ending periods (for the prior), words and counts for words before sentence-ending periods, and words and counts for words before not-sentence-ending periods. With add-one smoothing, I understand that $$P(w|\text{ending}) = \frac{\text{count}(w,\text{ending}) + 1}{\text{count}(w) + N},$$ where $P(w|\text{ending})$ is the conditional probability for word $w$ appearing before a sentence-ending period, $\text{count}(w,\text{ending})$ is the number of times $w$ appeared in the training text before a sentence-ending period, $\text{count}(w)$ is the number of times $w$ appeared in the training text (or should that be the number of times it appeared in the context of any period?), and $N$ is the "vocabulary size". The question is, what is $N$? Is it the number of different words in the training text? Is it the number of different words that appeared in the context of any period? Or just in the context of a sentence-ending period?
- pattern minor 112d agoDifferences between SISD, SIMD and MIMD architecture (Flynn classification)I have a problem with classifying certain CPUs to the proper classes of Flynn's Taxonomy. 1. Zilog Z80 According to this article on Sega Retro, Z80 has limited abilities to be classified as SIMD: a limited ability for SIMD (Single Instruction, Multiple Data) with instructions to perform copy, compare, input, and output over contiguous blocks of memory; For what I understand, Z80 is usually behaving as a SISD but when it comes to performing thing like copying or comparing Z80 is able to process multiple data using a single instruction. How should we classify Z80 then? Is the ability to become SIMD processor a voice for or against saying that Z80 implements SIMD architecture? 2. Intel i5 (Dual core) Form what I understand, we classify multicore CPUs as MIMD. Is it as simple as that? 3. ARM Cortex-A15 (single core) I'd classify the architecture of this processor as a SIMD model. Wikipedia says that it has superscalar pipeline, but as we know from Why is a superscalar processor SIMD? that multiple pipelines does not imply MIMD model. Are "modern" single cores usually implementing SIMD model or not?