Sunday, April 28, 2019

Split file in Linux

Split file in Linux.

split -l 10000 orginal_file.csv ./splitfiles/file_part_ -d --additional-suffix=.csv

Here, -l is for number of lines in each new file
-d is for numerical appending to new files, here it will be like file_part00.csv, file_part_01.csv etc
--additiona-suffix appends the desired extension to the file

***

Friday, April 26, 2019

Removing BOM(Byte order Mark) in file

Verifying if your file is having BOM (Byte order mark) in Linux.

vi -b original_file.csv

Usually, this can happen when you try to copy the file from linux to windows and make some changes to the file and then save it.

If you're not sure if the file contains a UTF-8 BOM, then this (assuming the GNU implementation of sed) will remove the BOM if it exists, or make no changes if it doesn't.

sed '1s/^\xEF\xBB\xBF//' < original_file.csv > new_file.csv
You can also overwrite the existing file with the -i option:

sed -i '1s/^\xEF\xBB\xBF//' original_file.csv

***

Other easy Method:

Verify if the file has BOM by
vi -b original_file.txt

Then open file normally with
vi original_file.txt
then do
:set nobomb
:wq

Reverify the file by doing vi -b original_file.txt

***

Thursday, April 25, 2019

Verifying file confidence in Linux for Character set

On Linux,

chardetect original_file.txt
Gives the confidence level[0-1] for the file.

Example:
$ chardetect script.sh
script.sh: ascii with confidence 1.0

$ chardetect file_part_41.csv
file_part_41.csv: ISO-8859-2 with confidence 0.587938805636

***

Wednesday, April 24, 2019

Adding/pre-append the text in Linux files

To add text to a starting of a file, use sed command:

sed -i '1i Header of the file' orginal_file.txt

Here, 1i mentions the following string need to be added to first line of the file

If you have multiple files to do the same thing, a small shell script with execute permission should be fine.

Example: 

for file in *.csv; do
        sed -i "1i Adding this to the starting of the file." "$file"
done

Caution: If you try to add header to all files after split, then the first file will be having duplicate header.

***

Saturday, April 20, 2019

Clear bash_history permanently

Run below command to clear the bash history of a user permanently.

Login to the user account and run below command.

cat /dev/null > ~/.bash_history && history -c && exit

***

Saturday, April 6, 2019

Solr Admin Handler error

Below is the warning written to the system log file in DSE, when we solr admin handler class in solrconfig.xml like from previous 6.x versions.

WARN [SecondaryIndexManagement:6] 2016-04-03 17:15:11,423 AdminHandlers.java:103 - <requestHandler name="/admin/"
class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore

Cause: 
This is an issue because in Solr 6.x solr.admin.AdminHandlers has been removed. ( SOLR-7388) 

In solrconfig.xml

Remove the below line.
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>

Then restart all DSE service on all nodes.

***