matlab - Import Mixed CSV that has quotes around text -
matlab - Import Mixed CSV that has quotes around text -
i importing csv file comma delimited matlab. each column has quotes around want consider text , comma.
i using read_mixed_csv function reply question read in info cell: import csv file mixed info types
thisdata = read_mixed_csv(fname, ','); % reads in csv file thisdata = regexprep(thisdata, '^"|"$',''); however, since few of columns this:
"fairhope, alabama" "fairhope high school, fairhope, alabama" "daphne-fairhope-foley, al" matlab places after comma new column.
"daphne-fairhope-foley, al" becomes 2 columns
"daphne-fairhope-foley al" how can matlab read in mixed csv file , not consider comma delimiter, consider quotation marks? there more automated way of doing textscan? if textscan option, like?
here sample of info i'm trying read in header included:
"state code","county code","site num","parameter code","poc","latitude","longitude","datum","parameter name","sample duration","pollutant standard","date local","units of measure","event type","observation count","observation percent","arithmetic mean","1st max value","1st max hour","aqi","method name","local site name","address","state name","county name","city name","cbsa name","date of lastly change" "01","003","0010","88101",1,30.498001,-87.881412,"nad83","pm2.5 - local conditions","24 hour","pm25 24-hour 2006","2013-01-01","micrograms/cubic meter (lc)","none",1,100.0,7.3,7.3,0,30,"r & p model 2025 pm2.5 sequential w/wins - gravimetric","fairhope, alabama","fairhope high school, fairhope, alabama","alabama","baldwin","fairhope","daphne-fairhope-foley, al","2014-02-11" "01","003","0010","88101",1,30.498001,-87.881412,"nad83","pm2.5 - local conditions","24 hour","pm25 24-hour 2006","2013-01-04","micrograms/cubic meter (lc)","none",1,100.0,7.6,7.6,0,32,"r & p model 2025 pm2.5 sequential w/wins - gravimetric","fairhope, alabama","fairhope high school, fairhope, alabama","alabama","baldwin","fairhope","daphne-fairhope-foley, al","2014-02-11" "01","003","0010","88101",1,30.498001,-87.881412,"nad83","pm2.5 - local conditions","24 hour","pm25 24-hour 2006","2013-01-07","micrograms/cubic meter (lc)","none",1,100.0,8.6,8.6,0,36,"r & p model 2025 pm2.5 sequential w/wins - gravimetric","fairhope, alabama","fairhope high school, fairhope, alabama","alabama","baldwin","fairhope","daphne-fairhope-foley, al","2014-02-11" "01","003","0010","88101",1,30.498001,-87.881412,"nad83","pm2.5 - local conditions","24 hour","pm25 24-hour 2006","2013-01-10","micrograms/cubic meter (lc)","none",1,100.0,7,7,0,29,"r & p model 2025 pm2.5 sequential w/wins - gravimetric","fairhope, alabama","fairhope high school, fairhope, alabama","alabama","baldwin","fairhope","daphne-fairhope-foley, al","2014-02-11" *note: converting csv file tab delimited file makes easier matlab deal , circumvents problem.
having text qualifier (like ") little tricky, next might work if ensure each row of table have same number of columns (and no empty ones).
anything not within text qualifier must convertible number.
function c = csvmixed(eachline,delim,textqualifier) % outputs cell containing mixed string , numeric info given delimiter (',') % , text qualifier ('"'). each line of delimited file must loaded % cell array eachline, , each line must have same number of columns. % % example: % fid = fopen('testcsv.txt','r'); % eachline = textscan(fid,'%s','delimiter','\n'); fclose(fid); % c = csvmixed(eachline{1},',','"') assert(ischar(delim) && numel(delim)==1); assert(ischar(textqualifier) && numel(textqualifier)==1); % find strings, specified input qualifier patternstr = sprintf('"([^"]*)"%c?',delim); patternstr = strrep(patternstr,'"',textqualifier); cstr = regexp(eachline,patternstr,'tokens'); % find numeric info patternnum = sprintf('(?<=(,|^))[^%c,a-za-z]*(?=(,|$))',textqualifier); patternnum = strrep(patternnum,',',delim); cnum = regexp(eachline,patternnum,'match','emptymatch'); numcols = cellfun(@numel,cstr) + cellfun(@numel,cnum); assert(nnz(diff(numcols))==0,'number of columns not consistent.') % string extents (begin, start) indexes each string strextents = regexp(eachline,patternstr,'tokenextents'); % deal out parsed info each line c = cell(numel(eachline),numcols(1)); ii = 1:numel(eachline), strbounds = vertcat(strextents{ii}{:}); delimlocs = getdelimlocs(eachline{ii},strbounds,delim); strcellmap = getcellmap(strbounds,delimlocs); c(ii,strcellmap) = [cstr{ii}{:}]; % todo: preallocate c(ii,~strcellmap) = num2cell(str2double(cnum{ii})); % else must numeric end end function delimlocs = getdelimlocs(linetext,solidbounds,delim) delimcharlocs = strfind(linetext,delim); delimlocs = delimcharlocs(~any(bsxfun(@ge,delimcharlocs,solidbounds(:,1)) & ... bsxfun(@le,delimcharlocs,solidbounds(:,2)),1)); end function cellmap = getcellmap(typebounds,delimlocs) cellmap = any(bsxfun(@gt,typebounds(:,1),[0 delimlocs]) & ... bsxfun(@lt,typebounds(:,1),[delimlocs inf]), 1); end update: prepare little typos in getdelimlocs. add together preallocation of cell array.
matlab csv import
Comments
Post a Comment