Leslie mentioned generating a bridge table column from a CRC32 (presumably using the CHECKSUM() function). Is there a specific reason to use that function instead of HASHBYTES()?
@leslie-welch5 күн бұрын
@JBartlett_DMU Not that I am aware of, but I will ask the engineering team! I'll circle back and let you know what they say. The actual SQL function we are using in Databricks is CRC32() and it is wrapped around some array functions to get a distinct list and sort that distinct list. Things brings us down to the smallest number of records needed to meet the ListID need for our many to many relationships.
@JBartlett_DMU5 күн бұрын
@@leslie-welch Ahhh, so it's a Databricks SQL thing! I think I was assuming you were talking about T-SQL.
@leslie-welch5 күн бұрын
@@JBartlett_DMU gotcha. My understanding is that Databricks is primarily based on Spark SQL with some additional functions specific to Delta tables. I haven't worked with T-SQL to my knowledge so I don't know exactly how it differs.
@coolblue592910 күн бұрын
What about multiple fact scenarios like sales orders and invoices. Parts of a sales order can be dispatched and invoices separately. Even sales order lines can be split with second dispatch for back-ordered quantity. There is a need to link sales order lines to invoice lines.
@leslie-welch9 күн бұрын
What is the relationship between the two? Is it always one or more invoices per sales order, or are there also sometimes multiple sales orders associated with one invoice?
@mrbartuss19 күн бұрын
What about shared dimensions?
@coolblue59297 күн бұрын
@@leslie-welch in my case it’s always one sales order per invoice and sometimes multiple invoices for a sales order.
@coolblue59297 күн бұрын
@@mrbartuss1 the shared dimensions filter via the sales orders.
@leslie-welch5 күн бұрын
@@coolblue5929 In this scenario I would look at incorporating the sales order fact into the invoice fact. It will of course create some replication at the row level of sales data, but if you can end up with a fact that is mainly dates and integers, and push as much of the text into dimensions, your DAX will be simpler and overall the report should be more performant.