Following the discussion on subsampling sequence from fasta/fastq, I think perhaps it is time to more openly advertise my in-house tool: seqtk. Currently, seqtk supports quality based trimming with the phred algorithm, converting fastq to fasta, reverse complementing sequences, extracting or masking subsequences in regions given in a BED/name list file, and more. I have just added a subsampling module to sample exactly n sequences or a fraction of sequences.
Seqtk supports both fasta and fastq input files, which can be optionally gzip compressed. Each module is perhaps the most efficient among tools of the same functionality. For example, I know fasta-to-fastq is 10X faster than another converter, while being more flexible.
Seqtk is implemented in a single .c file and two header files and only depends on zlib. The source code is freely available here (MIT license):