Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for QuateXelero Algorithm. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The function reads a Metabolic Network SBML file, and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file. Note: COBRA Toolbox must be installed in MATLAB before running this function [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2) INPUTS fileName1 The Library file includes pre-defined currency metabolites (in .txt format) Note: Library text file must include one metabolites per line (all in one column) fileName2 The metabolic Network in the SBML format OUTPUTS *_Enzyme_Cent_RCMLib_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections *_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible Yazdan Asgari 07/16/2016 http://yazdan59.ir/scan %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0001 function [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2) 0002 % Builds Directed Enzyme-Enzyme Networks Removing Currency Metabolites (based-on a Library file), which could be used as an input for QuateXelero Algorithm. 0003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0004 % The function reads a Metabolic Network SBML file, 0005 % and builds Directed Enzyme-Enzyme Networks which is compatible with QuateXelero Algorithm. 0006 % The QuateXelero is one of the best motif finding algorithms which is recently developed by Kavosh developer team. 0007 % http://lbb.ut.ac.ir/Download/LBBsoft/QuateXelero 0008 % So, one could easily use this algorithm in order to find motifs in different sizes for the metabolic network. 0009 % The Remove Currency Metabolites based-on Library (RCMLib) algorithm removes currency metabolites 0010 % in the metabolic network automatically IF AND ONLY IF the currency metabolits exists in the Library file. 0011 % Note: COBRA Toolbox must be installed in MATLAB before running this function 0012 % 0013 % [Output] = enz_cent_RCMLib_dir_quatexelero(fileName1,fileName2) 0014 % 0015 %INPUTS 0016 % fileName1 The Library file includes pre-defined currency metabolites (in .txt format) 0017 % Note: Library text file must include one metabolites per line (all in one column) 0018 % fileName2 The metabolic Network in the SBML format 0019 % 0020 %OUTPUTS 0021 % *_Enzyme_Cent_RCMLib_Dir_Index.dat Matrix Indeces of Enzyme-Enzyme Connections 0022 % *_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat Directed-Enzyme-Enzyme Network - QuateXelero Compatible 0023 % 0024 % Yazdan Asgari 07/16/2016 http://yazdan59.ir/scan 0025 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0026 0027 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0028 % check validity of input files format 0029 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0030 check1=regexp(fileName1,'.txt'); 0031 assert(~isempty(check1),'Error in the first input: The fileName1 must contain .txt at its end') 0032 check2=regexp(fileName2,'.xml'); 0033 assert(~isempty(check2),'Error in the second input: The fileName2 must contain .xml at its end') 0034 0035 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0036 % start time evaluation of program 0037 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0038 tic; 0039 0040 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0041 % reading the Library text file and construct array of currency metabolites 0042 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0043 fid = fopen(fileName1); 0044 tline = fgetl(fid); 0045 i=1; 0046 Curr_met={}; 0047 while ischar(tline) 0048 Curr_met{i,1}=tline; 0049 tline = fgetl(fid); 0050 i=i+1; 0051 end 0052 fclose(fid); 0053 [h,g]=size(Curr_met); 0054 0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0056 % reading the SBML file using COBRA Toolbox Command, and sets size of the S matrix 0057 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0058 model=readCbModel(fileName2); 0059 [m,n]=size(model.S); 0060 0061 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0062 % calculate summation of each columns (i.e. How many metabolites each enzyme correlates) 0063 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0064 S_bin=zeros(size(model.S)); 0065 S_bin(find(model.S))=1; 0066 CB=sum(S_bin,1); 0067 A=zeros(m,n); 0068 B=zeros(m,1); 0069 N3=zeros(m,1); 0070 N_curr=zeros(m,1); 0071 0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0073 % reading the Metabolites array and check their availability in the library text file 0074 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0075 for q=1:m 0076 for i=1:h 0077 if strcmp(model.metNames{q},Curr_met{i,1})==1 0078 N_curr(q,1)=N_curr(q,1)+1; 0079 end 0080 end 0081 end 0082 0083 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0084 % for each binary S-matrix element, subtracts its value from the column summation and put the result in the A matrix. 0085 % A(q) means the metabolite q connects to how many other metabolites through the enzyme i. 0086 % for each row, sums the binary S-matrix over all columns. 0087 % B(q) means how many enzymes the metabolite q correlates. 0088 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0089 for q=1:m 0090 for i=1:n 0091 if S_bin(q,i)~=0 0092 A(q,i)=CB(1,i)-S_bin(q,i); 0093 end 0094 B(q,1)=B(q,1)+S_bin(q,i); 0095 end 0096 end 0097 0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0099 % Assumption: Generally, every metabolite is connected to the other one through a specific enzyme. 0100 % If a metabolite connects to more than one metabolite through an enzyme, this will be considered as a suspicious case. 0101 % Therefore, every A(q) value equal to 3 will be marked for further analysis. 0102 % In addition, the availability of the metabolite in the library file will be checked. 0103 % So, the metabolites which do not exist in the library file, will not select for further analysis. 0104 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0105 for q=1:m 0106 for i=1:n 0107 if A(q,i)==3 && N_curr(q,1)~=0 0108 N3(q,1)=N3(q,1)+1; 0109 end 0110 end 0111 end 0112 0113 s=0; 0114 for q=1:m 0115 if N3(q,1)~=0 0116 s=1; 0117 end 0118 end 0119 0120 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0121 % If there is any value for N3 array, the RCM algorithm will be done. 0122 % This algorithm will be deleted the most probable metabolite among all (i.e. the one with the maximum value of N3 and C) 0123 % The selected metabolite will be deleted in the binary S-Matrix, and the "WHILE LOOP" repeated. 0124 % The algorithm is ended if there is not any suspicious case. 0125 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0126 while s==1 0127 C=zeros(m,1); 0128 max1=max(N3,[],1); 0129 for q=1:m 0130 if N3(q,1)==max1 0131 C(q,1)=B(q,1); 0132 else 0133 C(q,1)=0; 0134 end 0135 end 0136 max2=max(C,[],1); 0137 for q=1:m 0138 if ( (N3(q,1)==max1) && (C(q,1)==max2) ) 0139 for i=1:n 0140 S_bin(q,i)=0; 0141 model.S(q,i)=0; 0142 end 0143 end 0144 end 0145 0146 CB=sum(S_bin,1); 0147 A=zeros(m,n); 0148 B=zeros(m,1); 0149 N3=zeros(m,1); 0150 for q=1:m 0151 for i=1:n 0152 if S_bin(q,i)~=0 0153 A(q,i)=CB(1,i)-S_bin(q,i); 0154 end 0155 B(q,1)=B(q,1)+S_bin(q,i); 0156 end 0157 end 0158 for q=1:m 0159 for i=1:n 0160 if A(q,i)==3 && N_curr(q,1)~=0 0161 N3(q,1)=N3(q,1)+1; 0162 end 0163 end 0164 end 0165 s=0; 0166 for q=1:m 0167 if N3(q,1)~=0 0168 s=1; 0169 end 0170 end 0171 end 0172 0173 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0174 % building the output file name 0175 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0176 outname1=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_Index.dat') 0177 fout1 = fopen(outname1, 'w+'); 0178 0179 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0180 % finds non-zero elements of the S-matrix (in order to make the algorithm faster), 0181 % parses through each row, and considers an edge for every unlike-signs. 0182 % It also consider Reversibility for Enzyme-Enzyme network. 0183 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0184 num=zeros(size(model.rxns)); 0185 for j=1:m 0186 indices=find(model.S(j,:)); 0187 [a,b]=size(indices); 0188 r=0; 0189 if b~=0 0190 r=1; 0191 end 0192 while r<b 0193 i=1; 0194 while i<(b-r+1) 0195 if model.S(j,indices(1,r))>0 && model.S(j,indices(1,r+i))<0 0196 if model.rev(indices(r+i))==1 && model.rev(indices(r))==1 0197 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0198 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0199 num(1,indices(1,r))=1; 0200 num(1,indices(1,r+i))=1; 0201 else 0202 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0203 num(1,indices(1,r))=1; 0204 num(1,indices(1,r+i))=1; 0205 end 0206 elseif model.S(j,indices(1,r))<0 && model.S(j,indices(1,r+i))>0 0207 if model.rev(indices(r+i))==1 && model.rev(indices(r))==1 0208 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0209 fprintf(fout1,'%d\t%d\n',indices(1,r),indices(1,r+i)); 0210 num(1,indices(1,r))=1; 0211 num(1,indices(1,r+i))=1; 0212 else 0213 fprintf(fout1,'%d\t%d\n',indices(1,r+i),indices(1,r)); 0214 num(1,indices(1,r))=1; 0215 num(1,indices(1,r+i))=1; 0216 end 0217 end 0218 i=i+1; 0219 end 0220 r=r+1; 0221 end 0222 end 0223 0224 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0225 % building the output file name 0226 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0227 outname2=strrep(fileName2,'.xml','_Enzyme_Cent_RCMLib_Dir_QuateXelero.dat') 0228 fout2=fopen(outname2,'w+'); 0229 0230 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0231 % reading the constructed Enzyme-Enzyme network file and re-format it to a Kavosh-compatible file. 0232 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0233 fid = fopen(outname1); 0234 C=fscanf(fid,'%d'); 0235 i=1; 0236 while isinteger(fid) 0237 C(i)=fscanf(fid,'%d',C); 0238 i=i+1; 0239 end 0240 g=size(C); 0241 A=size(unique(C)); 0242 if g~=0 0243 n=1; 0244 else 0245 disp('Error in reading the file, No Edge detected') 0246 end 0247 k=1; 0248 j=2; 0249 last=g/2; 0250 fprintf(fout2,'%d\n',A(1,1)); % total number of uniques nodes in the network (needed for Kavosh Algorithm) 0251 for i=1:last 0252 fprintf(fout2,'%d\t%d\n ',C(k),C(j)); 0253 k=k+2; 0254 j=j+2; 0255 end 0256 fclose(fid); 0257 fclose(fout2); 0258 0259 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0260 % End of time evaluation of program 0261 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0262 toc;