Recent Entries 10
- pattern minor 112d agoVectorizing the product of an array of complex numbersI am trying to write fast/optimal code to vectorize the product of an array of complex numbers. In simple C this would be: ``` #include complex float f(complex float x[], int n ) { complex float p = 1.0; for (int i = 0; i < n; i++) p *= x[i]; return p; } ``` However, gcc can't vectorize this and my target CPU is the AMD FX-8350 which supports AVX. To speed it up I have tried: ``` typedef float v4sf __attribute__ ((vector_size (16))); typedef union { v4sf v; float e[4]; } float4; typedef struct { float4 x; float4 y; } complex4; static complex4 complex4_mul(complex4 a, complex4 b) { return (complex4){a.x.v*b.x.v -a.y.v*b.y.v, a.y.v*b.x.v + a.x.v*b.y.v}; } complex4 f4(complex4 x[], int n) { v4sf one = {1,1,1,1}; complex4 p = {one,one}; for (int i = 0; i < n; i++) p = complex4_mul(p, x[i]); return p; } ``` Can this code be improved for AVX and is it as fast as it can get?
- pattern minor 112d agoVectorizing a pixel-averaging operation in NumpyI am reading from a file containing some segments (irregular parcels of the image) and trying to average the entire segment to have one pixel value. This is the code I use: ``` band = band[:,:,0] #take the first band of the image for i in range(numSegments): #for every segment tx = band[segments==i] #select all the pixels in segment avg = np.average(tx) #average the values band[segments==i] = avg #write the average back into the image ``` I am omiting some transformation steps and code for printing running time from the snippet. This takes quite sometime to run for even one band. Almost 1000 seconds. I was wondering if there is a way to vectorize this operation to make it faster? Data: `Segment2009`: an image of all the segments in the image. This is what the segments look like: `Bands`: 3000x3000 pixels, `float32` values. Full context: ``` workFolder = '/home/shaunak/Work/ChangeDet_2016/SLIC/003_lee_m0_alpha' bandlist=os.path.join(workFolder,'bandlist.txt') configfile = os.path.join(workFolder,'config.txt') segmentfile = os.path.join(workFolder,'Segments2009') #%% Load the bands -- can refer to subfolders in the bandlist files = utilities.readBandList(bandlist) destinations = [] for f in files: destinations.append(f.split('.')[0]+"_SP."+f.split('.')[1]) (lines,samples,bands) = utilities.readConfigImSizeBand(configfile) #%% Superpixel file segments = np.fromfile(segmentfile,dtype='float32') segments = np.reshape(segments,(lines,samples)) numSegments = int(np.max(segments)) #%% simple avg for idx,f in enumerate(files): band = np.fromfile(f,dtype='float32').reshape((lines,samples)) start = time.time() for i in range(numSegments): tx = band[segments==i] avg = np.average(tx) band[segments==i] = avg band.tofile(destinations[idx]) ``` I am writing the values back to the original after averaging. It is not necessary, also not the most expensive part -- and helps me visualize the results bett
- pattern minor 112d agoVectorizing Two Sample Kolmogorov-Smirnov testI'm implementing the Two-sample Kolmogorov-Smirnov test in MATLAB. I must admit I know very little of formal statistics and was simply trying to implement the description in wikipedia's page. Right now I take as input the two vectors to be compared `x1` and `x2`, and optionally the p-value that the check will run on. It outputs `1` if the vectors pass the test and `0` if they do not. A couple of questions I have: Is there a better way to compute the empirical distribution functions? Right now I evaluate the empirical distribution function at the points defined in the vector `t`. In my naive approximation, I made it go from the smallest value of the vectors to the largest, using twice as many points as the vector with the most points. I have no idea is this is the best way to compute it, and suspect the vector doesn't have to be nearly as dense. Can I vectorize the search through the `t` vector? Right now I have a for loop that goes through every value of `t`and computes the empirical distribution function in that way. It seems that there might be a way to vectorize this operation further and get rid of the for loop. Haven't figured out a good way to do it however. And of course, any other suggestions are welcome! My code is below: ``` function [OUT] = k_stest(x1,x2,p) % Set default p-value if nargin == 2 p = 0.05; end % Compute lenghts N1 = length(x1); N2 = length(x2); % Set vector t over which the empirical distribution function will be computed. t = linspace(min(min([x1 x2])),max(max([x1 x2])),2*max([N1 N2])); % Initialize the statistic, negative values guarantees it will be overwritten by first call. D = -1; % Set c from tabulated values if p == 0.10 c = 1.22; elseif p == 0.05 c = 1.36; elseif p == 0.025 c = 1.48; elseif p == 0.01 c = 1.63; elseif p == 0.005 c = 1.73; elseif p == 0.001 c = 1.95; else disp('Invalid p-value. O
- pattern minor 112d agoExtract the largest value for each day from a matrixI have a matrix in which the right-most elements are repeated YYYYMMDD dates in descending order, for example: ``` 40 1122 1711 20160326 169 700 950 20160326 40 1630 1711 20160326 182 700 950 20160327 40 1029 1711 20160327 169 700 950 20160327 40 1630 1711 20160327 122 700 950 20160328 40 1630 1711 20160328 169 700 950 20160328 40 1630 1711 20160328 3049 700 950 20160331 40 1630 1711 20160331 3049 700 950 20160331 40 1630 1711 20160331 169 700 950 20160401 40 1630 1711 20160401 169 700 950 20160401 40 1630 1711 20160401 ``` Within each date, I want to keep only the row that corresponds to the largest element in the leftmost column. So I would like to produce a new matrix: ``` 169 700 950 20160326 182 700 950 20160327 169 700 950 20160328 3049 700 950 20160331 169 700 950 20160401 ``` The code I have now is: ``` idx1 = find([1;diff(A(:,4))]); idx2 = find([diff(A(:,4));1]); B = zeros(length(idx1),4); for ii = 1:length(idx1) row_number = find(A(idx1(ii):idx2(ii),1) == max(A(idx1(ii):idx2(ii),1)),1); B(ii,:) = A(idx1(ii)+row_number-1,:); end ``` Are there ways to improve this code? I'm looking for coding conventions, improve performance, possible vectorization etc.
- pattern minor 112d agoBiometric matching score calculatorI'm working on a biometric matching system and I would like to have few suggestions regarding vectorizing the following code: ``` % tpol and ipol % column 1: x co-ordinate of the feature % column 2: y co-ordinate of the feature % column 3: angle orientation % column 4: type of feature ref_points=size(tpol,1); %number of rows=number of reference image points in_points=size(ipol,1); %number of rows=number of input image points radsize=7*ones(ref_points,1); %Radius size angsize=11* ones(ref_points,1); %Angle size radlow=-radsize./2; % Lower Radius radhigh=radsize./2; % Upper Radius anglow=-angsize./2; % Lower Angle anghigh=angsize./2; % Upper Angle epsillon=10; mscore=0; % initializing the matching score for i=1:ref_points for j=1:in_points rdiff=tpol(i,1)-ipol(j,1); % Difference between the x-coordinate ediff=tpol(i,2)-ipol(j,2); % Difference between the y-coordinate thetadiff=tpol(i,3)-ipol(j,3); %Difference between the orientation if ((radlow(i) < rdiff) && (rdiff < radhigh(i)) && (anglow(i) < ediff) && ... (ediff < anghigh(i)) && (abs(thetadiff) < epsillon) && ... (tpol(i,4)==ipol(j,4))) mscore=mscore+1; tpol(i,4)=3; %Change type of the feature to know that the line was used end end end end ``` `ipol` and `tpol` are the input image and reference image points respectively. They are both a matrix of size (A x 4) each, A being the number of features detected in each image. I asked this question on Mathworks for which I received the following answer: ``` r = [-3.5 3.5]; % Lower Radius and Upper Radius a = [-5.5 5.5]; % Lower Angle and Upper Angle epsillon=10; % Vectorized computation of rdiff,ediff,thetadiff i.e. x,y-co-ordinate % and orientation pol = bsxfun(@minus,tpol,permute(ipol,[3 2 1])); p = all([bsxfun(@gt,pol(:,1:2,:),[r(1),a(1)]) & bsxfun(@lt,pol(:,1:2,:),[r(2),a(2)]),...
- pattern minor 112d agoEvaluating a series of Legendre polynomialsThe following function represents the electrostatic potential, in spherical coordinates, due to a ring of charge \$q=1\$ and radius \$R=1\$, placed in the plane \$x\$-\$y\$: $$\phi(r,\theta) = \sum_{n=0}^\infty \frac{r_\min^n}{r_\max^{n+1}} P_n(\cos\theta) P_n(0) $$ where \$P_n\$ are the Legendre Polynomials of degree \$n\$, \$r_\min = \min(r,1)\$, and \$r_\max = \max(r,1)\$. I wrote a code in python to plot this function, with the sum truncated to \$0\leq n \leq 35\$, in a cartesian grid of 50x50 points. But this takes several minutes to complete. The same code in other languages gives almost instantaneous results in mi computer. This is the code I want to improve: ``` import matplotlib.pyplot as plt # plotting routines import scipy.special as sp # legendre polynomials import numpy as np from timeit import default_timer as timer # Potential from a ring of radius c=1 and charge q=1 # phi(r,th) = \sum_{n=0}^\infty (rmin^n/rmax^{n+1}) P_n(cos(th)) P_n(0.0) # # where rmin=min(r,1) and rmax=max(r,1). # The potential is normalized with respect to q/c, where q is the charge of the ring. # The radii is normalized with respect to c. def phi(x, z): r = np.sqrt(x**2 + z**2) costh = z/r rmin = min(r,1) rmax = max(r,1) dr = rmin/rmax suma = 0.0 for n in range(0,35): suma += dr**n * sp.legendre(n)(costh) * sp.legendre(n)(0.0) return suma/rmax phivec = np.vectorize(phi) # grid for plotting X, Y = np.meshgrid( np.linspace(-2,2,50), np.linspace(-2,2,50) ) # evaluating phi(X,Y) start = timer() Z = phivec(X,Y) end = timer() print "time = ", end-start # plotting phi plt.figure() CS = plt.contour(X, Y, Z) plt.clabel(CS, inline=1, fontsize=10) plt.title(r'$\psi(r,\theta) = \sum_{\ell=0}^\infty \frac{r_^{\ell+1}} P_\ell(0) P_\ell(\cos\theta)$', y=1.03) plt.xlabel(r'$x/c$', fontsize=20) plt.ylabel(r'$z/c$', fontsize=20) plt.show() ``` Here is the result:
- pattern minor 112d agoVectorizing a polar coordinate conversion loop in a fingerprint-matching processI am working on a fingerprint matching technique and it takes a lot of computation time to obtain the result. I am trying to implement the matching process given in A Minutia Matching Algorithm in Fingerprint Verification. I would like to know if there is any method available to reduce the `for` loop for this code: ``` [Tdrow,Tdcol]=size(Td); [Idrow,Idcol]=size(Id); matchingscore=zeros(Tdrow,Idrow); rv = bsxfun(@minus,Td(:,3),permute(Id(:,3),[2 1])); %convert each minutiae point to polar coordinates with respect to the %reference minutiae in each case for i=1:Tdrow for j=1:Idrow Tdp=zeros(Tdrow,Tdcol); Idp=zeros(Idrow,Idcol); tref=Td(i,:); iref=Id(j,:); Tdp(:,1)=sqrt((Td(:,1)-tref(1)).^2 + (Td(:,2)-tref(2)).^2); Tdp(:,2)=mod(atan2(tref(1)-Td(:,1),Td(:,2)-tref(2)) * 180/pi,360); Tdp(:,3)=mod(Td(:,3)-tref(3),360); Tdp(:,4)=Td(:,4); Idp(:,1)=sqrt((Id(:,1)-iref(1)).^2 + (Id(:,2)-iref(2)).^2); Idp(:,2)=mod((atan2(iref(1)-Id(:,1),Id(:,2)-iref(2)) * 180/pi) + rv(i,j),360); Idp(:,3)=mod(Id(:,3)-iref(3),360); Idp(:,4)=Id(:,4); Tdp1 {i,j}= Tdp(:,:); Idp1 {i,j}= Idp(:,:); end end ``` Here, the variable `Td` is the fingerprint template to be compared and `Id` is the input fingerprint template. Both `Td` and `Id` have four columns, i.e. 1st and 2nd column provide x and y coordinate respectively, 3rd column is the angle and 4th column is the type of minutiae. `Tdp` and `Idp` are calculated using the equation provided in the attachment provided in the link. Can this code be vectorized? I understand that the indexes have hardly been used. But without changing the index i cannot continue to the further steps. It would be very helpful is anyone could provide me with any suggestions. The data values are available on Google Drive. These has both the input data and output data.
- pattern minor 112d agoRolling regressions in RI want to run rolling regressions over each group and store the coefficient. The following works, but it's slow, since I have too many series and I want to run too many regressions for each group. Using `foreach()`, speeds things up (and also getting the coefficient with `(X'X)^{-1}X'Y` but is there a way to vectorize this operation? Also open for any and all improvements. ``` library(data.table) run.rolling.regressions <- function(x) { DT <- data.table( Y = rnorm(10000), X = rnorm(10000), key.group = rep(LETTERS[1:10], each = 1000)) window.length <- 12 names.of.groups <- unique(DT$key.group) number.of.groups <- length(names.of.groups) X.coefficients <- list() for(j in 1:length(names.of.groups)) { regressed.DT <- DT[key.group == names.of.groups[j]] nrows.of.group <- nrow(regressed.DT) print(paste0(j, ', key.group: ', names.of.groups[j])) for (i in window.length:nrows.of.group) { if(i == window.length) { X.coefficients[[names.of.groups[j]]] <- c(rep(NA, nrows.of.group)) } X.coefficients[[names.of.groups[j]]][i] <- lm(Y ~ 1 + X, data = regressed.DT[(i - 11):i])$coefficients['X'] } } return(X.coefficients) } system.time(X.coef <- run.rolling.regressions()) ```
- pattern minor 112d agoJacobian productDo you have in mind a way to compute the following sum $$R_{i,j,k} = \sum_{l=0}^{2} J_{i,j,l+3k} v_{i,j,l}$$ obviously in a vectorised fashion? This may sound like an equivalent of: ``` c = np.einsum('ij(l + 3k),ijl->ijk', j, v) ``` where of course the notation 'ij(l + 3k),ijl->ijk' is not accepted by einsum. I tried with the following, but I wish I could use one of the existing NumPy or SciPy functions: ``` def jacobian_product(j_input, v_input): """ :param j_input: jacobian m x n x (4 or 9) jacobian column major :param v_input: matrix m x n x (2 or 3) to be multiplied by the jacobian. :return: m x n x (2 or 3) whose each element is the result of the product of the jacobian (i,j,:) multiplied by the corresponding element in the vector v (i,j,:). In tensor notation: R_{i,j,k} = \sum_{l=0}^{2} J_{i,j,l+3k} v_{i,j,l} """ # dimensions of the problem: d = v_input.shape[-1] vol = list(v_input.shape[:-1]) # repeat v input 3 times, one for each row of the jacobian matrix 3x3 or 2x2 in corresponding position: v = np.tile(v_input, [1]*d + [d]) # element-wise product: j_times_v = np.multiply(j_input, v) # Sum the three blocks in the third dimension: return np.sum(j_times_v.reshape(vol + [d, d]), axis=d+1).reshape(vol + [d]) ``` Update after Veedrac comment: here there are two tests for both 2d and 3d vectors. The jacobians are collected into an array of shape [m,n,4] or [m,n,9] (for 2d or 3d. The first equation proposed in this thread is about the 3d case). Vectors to be multiplied by the Jacobian in the corresponding positions have shape [m,n,2] or [m,n,3] respectively. Sorry if this was not clear, my bad I didn't state dimensions since the beginning! ``` from numpy.testing import assert_array_equal def test_jacobian_product_toy_example_2d(): def function_v(t, x): t = float(t); x = [float(y) for y in x] return x[1], -1 * x[0] def function_jacobian(t, x): t = f
- pattern minor 112d agoRemove duplicates in list of lists where an item in each list is ignored when determining duplicityI have `lines_list_copy` of the form: ``` [['a1', 'b1', 'c1', 'd1', 'e1'], ['a2', 'b2', 'c2', 'd2', 'e2'], ... ] ``` I needed to remove all duplicate entries where `a`,`b`,`c`,`d` are identical. So note that I don't care about what value `e` has. So for example, If `lines_list_copy = [['a1', 'b1', 'c1', 'd1', 'e1'], ['a2', 'b2', 'c2', 'd2', 'e2'], ['a1', 'b1', 'c1', 'd1', 'e1'], ['a1', 'b1', 'c1', 'd1', 'e2']]` we have three values that are the same, namely `lines_list_copy[0]`, `lines_list_copy[2]` and `lines_list_copy[3]` and any 2 of them need to be deleted which will give us the value of `lines_list`. At the end deleting any two results in a valid outputs for `lines_list` `lines_list_copy` has lengths typically exceeding 200000 and realistically will eventually exceed 500000 with the amount of data we are collecting. Thus I needed a way to remove duplicates fast. I found a way to efficiently remove all duplicates, but the method would take `e` into account and thus wouldn't give me what I need. Therefore, I delete all the `e` values in each list first like so: ``` for x in lines_list_copy: del x[cfg.TEXT_LOC_COL] lines_list_copy = [list(x) for x in set(tuple(x) for x in lines_list_copy)] ``` After which I have `lines_list_copy` as I need it. All I need to do is re-add any one of the `e` values for each list. My double for loop is admittedly naive and more so that I didn't think it would bring my program to a crawl. ``` for line_copy_ind in range(len(lines_list_copy)): for line_ind in range(len(lines_list)): if lines_list_copy[line_copy_ind][cfg.TIME_COL] == lines_list[line_ind][cfg.TIME_COL] and \ len(lines_list_copy[line_copy_ind]) == 4: lines_list_copy[line_copy_ind].append(lines_list[line_ind][cfg.TEXT_LOC_COL]) lines_list = lines_list_copy ``` I looked into vectorizing and using filter but just cant seem to reverse engineer solutions to other problems and make them work for my problem of a