Handling Large Text Files

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling Large Text Files

Steve Chernyak
Hello

I need to handle a large text file and I'm wondering if it's possible to do it without reading the entire thing into memory and also without writing the entire thing out to disk in clear text.

Encrypted, the file is only 4 megs. However, once decrypted it balloons out to a few hundred megs. The file consists of rows of data.

Is it possible to decrypt the file in relatively small chunks and look for the row delimiter character. Once the character is found, process the row and discard the data. Continue to do this until the entire file is processed. Sort of like processing SQL query results with a row handler.

I see examples have code for dealing with binary data using a byte array as a buffer. Is it safe to convert the buffer to a string or is it possible the buffer will end mid character?

Is there any documentation/examples available for doing what I want?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Handling Large Text Files

David Hook-3

The CMS streaming API is probably what you want to be using. Have a look
at the tests around CMSEnvelopedDataStreamGenerator. It sounds a lot
like the data needs to be compressed (or is already compressed) as well
though (CMS also supports this).

Otherwise you could certainly role your own using the Cipher streaming
classes.

Regards,

David

On 13/09/18 12:12, Steve Chernyak wrote:

> Hello
>
> I need to handle a large text file and I'm wondering if it's possible to do it without reading the entire thing into memory and also without writing the entire thing out to disk in clear text.
>
> Encrypted, the file is only 4 megs. However, once decrypted it balloons out to a few hundred megs. The file consists of rows of data.
>
> Is it possible to decrypt the file in relatively small chunks and look for the row delimiter character. Once the character is found, process the row and discard the data. Continue to do this until the entire file is processed. Sort of like processing SQL query results with a row handler.
>
> I see examples have code for dealing with binary data using a byte array as a buffer. Is it safe to convert the buffer to a string or is it possible the buffer will end mid character?
>
> Is there any documentation/examples available for doing what I want?
>
> Thanks
>