Read text from pdf - any idea?
Posted: Mon Dec 11, 2023 2:37 pm
Do you have any idea, maybe sample, how to read data from tables placed in PDF file?
Exclusive forum for HMG, a Free / Open Source xBase WIN32/64 Bits / GUI Development System
http://hmgforum.com/
Code: Select all
pdf.cpp:205:22: error: '_TCHAR' has not been declared
205 | int _tmain(int argc, _TCHAR* argv[])
| ^~~~~~
pdf.cpp: In function 'int _tmain(int, int**)':
pdf.cpp:233:74: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings]
233 | size_t streamstart = FindStringInBuffer (buffer, "stream", filelen);
| ^~~~~~~~
pdf.cpp:234:74: warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings]
234 | size_t streamend = FindStringInBuffer (buffer, "endstream", filelen);
| ^~~~~~~~~~~
Code: Select all
//Now use zlib to inflate:
z_stream zstrm; ZeroMemory(&zstrm, sizeof(zstrm));
zstrm.avail_in = streamend - streamstart + 1;
zstrm.avail_out = outsize;
zstrm.next_in = (Bytef*)(buffer + streamstart);
zstrm.next_out = (Bytef*)output;
int rsti = inflateInit(&zstrm);
if (rsti == Z_OK)
{
int rst2 = inflate (&zstrm, Z_FINISH);
if (rst2 >= 0)
{
//Ok, got something, extract the text:
size_t totout = zstrm.total_out;
ProcessOutput(fileo, output, totout);
}
}
delete[] output; output=0;
buffer+= streamend + 7;
filelen = filelen - (streamend+7);
Code: Select all
!"#$"%&'$%()*&+,-./01&23"456$)3$75"&89:;<&:8=88:&>5?4(@7A@3"0)BCD&E:;&FGG&HIJ&H8;<&/2KD&J:J8I8L::H2/'&M"%6&NBO46$<;F&GIFI&GG:8&GIII&IIJI&J8GF&:L;F
>5?4(@7A@3"
P$)Q47)&3R4("3$)%$"DI8CI8C8I8L
S"("&4T*5)U"VRDI8CI8C8I8L
S"("&3R4("3$)%$"D
WT*5)U"37"D
/"#R37"D!"#$"%&'$%()*&+,-./01>B)"%&A@X4)&S3@*"6&Y%%"/2KD&J:J8I8L::H
23"456$)3$75"&89:;:8=88:&>5?4(@7A@3"
W$6@*46$)Z@&G[&:8=LII&PR456\3
/2KD&FHH&GJL&IL&8I!"6(X*"&]Y0&&8;:98I8L&@*RZ$%"^%"&T@U4("3$)&5"_\3$)%$"&1`&8L898I8L&5&U%$"&GJCIGC8I8L
mol wrote: ↑Tue Dec 12, 2023 9:29 pm I compiled sample, but I'm getting some trashes instead of text from pdf:I have no idea how to continue this work...Code: Select all
!"#$"%&'$%()*&+,-./01&23"456$)3$75"&89:;<&:8=88:&>5?4(@7A@3"0)BCD&E:;&FGG&HIJ&H8;<&/2KD&J:J8I8L::H2/'&M"%6&NBO46$<;F&GIFI&GG:8&GIII&IIJI&J8GF&:L;F >5?4(@7A@3" P$)Q47)&3R4("3$)%$"DI8CI8C8I8L S"("&4T*5)U"VRDI8CI8C8I8L S"("&3R4("3$)%$"D WT*5)U"37"D /"#R37"D!"#$"%&'$%()*&+,-./01>B)"%&A@X4)&S3@*"6&Y%%"/2KD&J:J8I8L::H 23"456$)3$75"&89:;:8=88:&>5?4(@7A@3" W$6@*46$)Z@&G[&:8=LII&PR456\3 /2KD&FHH&GJL&IL&8I!"6(X*"&]Y0&&8;:98I8L&@*RZ$%"^%"&T@U4("3$)&5"_\3$)%$"&1`&8L898I8L&5&U%$"&GJCIGC8I8L
I think it's possible to write it in pure harbour, but I don't know how to decompress text variable in memory, what is compression method etc...