The Problem
I am trying to write and test R script against some data from customer. But the data is too big, it would take a lot of time to load the data and run the script. So it would be to extract a small fraction from the original data.
The Solution
First extract the first line from the original csv file, write to destination file.
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt
Notice that by default Out-File cmdlet or redirection command >> uses system default encoding when write to a file. Most application by default uses utf-8 or utf-16 to read data. Hence we use -Encoding utf8 here.
Then we read all lines except the first line: Get-Content big.csv | where {$_.readcount -gt 1 }
Then randomly select 1000 lines and append them to the destination file.
Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt
The Complete Script
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt; Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt
Related Script: Get default system encoding
[System.Text.Encoding]::Default
[System.Text.Encoding]::Default.EncodingName
Resource
PSTip: Get-Random
Get-Random Cmdlet
I am trying to write and test R script against some data from customer. But the data is too big, it would take a lot of time to load the data and run the script. So it would be to extract a small fraction from the original data.
The Solution
First extract the first line from the original csv file, write to destination file.
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt
Notice that by default Out-File cmdlet or redirection command >> uses system default encoding when write to a file. Most application by default uses utf-8 or utf-16 to read data. Hence we use -Encoding utf8 here.
Then we read all lines except the first line: Get-Content big.csv | where {$_.readcount -gt 1 }
Then randomly select 1000 lines and append them to the destination file.
Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt
The Complete Script
Get-Content big.csv -TotalCount 1 | Out-File -Encoding utf8 sample.txt; Get-Content big.csv | where {$_.readcount -gt 1 } | Get-Random -Count 100 | Out-File -Encoding utf8 -Append sample.txt
Related Script: Get default system encoding
[System.Text.Encoding]::Default
[System.Text.Encoding]::Default.EncodingName
Resource
PSTip: Get-Random
Get-Random Cmdlet