Omg it’s sooo daammmn slooow it takes around 30 seconds to bulk - insert 15000 rows
Disabling indices doesn’t help. Database log is at SIMPLE. My table is 50 columns wide, and from what i understand the main reason is the stupid limit of 2100 parameters in query in ODBC driver. I am using the . NET SqlBulkCopy. I only open the connection + transaction once per ~15000 inserts
I have 50 millions rows to insert, it takes literally days, please send help, i can fucking write with a pen and paper faster than damned Microsoft driver inserts rows
A friend of a friend found that exporting to csv and importing is the fastest route. Honestly crazy, but I recreated a test and it’s actually a little faster (when dumping and recreating the whole table, ymmv when inserting).
I’m not 100% sure if it was MSSQL, though.
What is your latency? Can you move data closer to where db is (cloud)? Did you change isolation level? Or recovery model? Did you try bcp
? Any indexes you have in table should be deleted?
I’ve done a lot of work and no, that is not normal.
A few things: First - SQL server has tools for migrating data that’s pretty fast. SQL bulk copy can use some of these. Check to see if the built in db tools are better for this.
SQL bulk copy can handle way more than 15,000 records
Why are you wrapping a data dump in a transaction? That will slow things down for sure.
You generally shouldn’t be doing huge queries like that to where you’re nearing the parameter limit.
Can you share the code?
I timed the transaction and opening of the connection, it takes maybe a 100 milliseconds, absolutely doesn’t explain ghe abysmal performance
Transaction is needed because 2 tables are touched, i don’t want to deal with partially inserted data
Cannot share the code, but it’s python calling .NET through “clr”, and using SqlBulkCopy
What do you suggest i shouldn’t be using that? It’s either a prepared query, with thousands of parameters, or a plain text string with parameters inside (which admittedly, i didn’t try, might be faster lol)
One thing to know about transactions is that they track data and then write it. It’s not the opening that slows it down. I have a question though, what is your source data? Do you have a big CSV for something? Can you do a db to db transfer instead? There’s another tool called the BCP utility.
Edit: SQL server/ssms have tools for doing migrations and batch imports
Been a little while since I worked on ODBC stuff, but I have a couple of thoughts:
-
Would it be possible to use something like a table function on the DB side to simplify the query from the ODBC side?
-
I could be misremembering, but I feel like looping through individual inserts with an open connection was faster than trying to submit data in bulk when inserting that much data in one shot. Might be worth doing a benchmark in a test DB and table to confirm.
I know I was able to insert more than 50M rows in a manner of single digit hours, but unfortunately don’t have access to that codebase anymore to double check the specifics.
Try BCP. I’m fairly new to the Microsoft landscape too, but found using BCP really helped with efficiency on loading.
I will try bcp. Somehow, i was convinced I had to have access to the machine running the sql server to use it, but from the doca i see i can specify a remote host… Will report back! EDIT: I can’t install bcp because it is only distributed with SQLServer itself, and I cannot install it on my corporate laptop.