Google Machine Learning Engineer Python Interview

Рет қаралды 16,225

Күн бұрын

Пікірлер

@iqjayfeng 2 жыл бұрын

Here's the full question with a code editor if you want to try it out!: www.interviewquery.com/questions/automatic-histogram

@Snorrason 2 жыл бұрын

Nicely done! Just to note that none of the tests cover cases where a bin would have zero values. At a quick glance, it could be remedied by initializing a `count` variable to zero (instead of`histogram[bucket_str] = 0`) before the last for-loop. Once you break out of the loop, you can check if the count is non-zero before adding it to the histogram.

@yemanbrhane3900 3 ай бұрын

The number of bins should be (max -min)/x

@tokpot 2 жыл бұрын

The solution is wrong... and it's quite easy to test how : dataset = [7,8,9] x = 2 returned : {7-9 : 3} here's a pretty much 'coverall' solution using no libraries : def automatic_histogram(dataset, x): histogram = {} if len(dataset)==0: return histogram if len(set(dataset))>x: return histogram min_number = dataset[0] max_number = dataset[0] for i in dataset: if i < min_number: min_number = i if i > max_number: max_number = i if min_number==max_number: return {str(min_number):len(dataset)} width = (max_number-min_number)//x edges = [min_number, *[min_number + (i+1)*(width+1) for i in range(x-1)]] bins = [str(edges[i])+'-'+str(edges[i+1]-1) if edges[i]!=edges[i+1]-1 else str(edges[i]) for i in range(x-1)] if edges[-1]==max_number: bins.append(str(max_number)) else: bins.append(str(edges[-1])+'-'+str(max_number)) for b in bins: histogram[b] = 0 for i in range(x-1): for d in dataset: if edges[i]

@DanielMichaelFrees 4 ай бұрын

I believe this is forgetting to drop bins with count 0. I also wonder if an interviewer might expect the function to adjust bins for count 0 so that we always achieve number of bins = x if possible (this would be an uglier question if so).

@piyushagarwal8330 2 жыл бұрын

Started with not using sort to decrease complexity from Onlogn then added a for loop in while loop for On². Btw very nice solution, wasn't able to solve myself.

@Snorrason 2 жыл бұрын

Fortunately, the while loop goes closer to O(x * bucket_width) in time complexity, an improvement on O(n²).

@danielaldana6957 2 жыл бұрын

What level of difficulty is this one?

@nedaebrahimi6735 2 жыл бұрын

instead of using loops for calculating min and max we could easily use the min(dataset) and max(dataset) and use Counter to count the number of elements in the dataset. here is my solution: def histogram(data, x): res={} min_value=min(data) max_value= max(data) bins= math.ceil((max_value)/x) counts= Counter(data) if max_value==min_value: return {str(max_value):counts[max_value]} i=min_value while i

@abyszero8620 2 жыл бұрын

My kingdom for a space bar

@jp_magal 2 жыл бұрын

I liked both question and solution! But I don't think it would pass test case with a dataset containing only negative values, and the question doesn't state there can't be negatives, does it?

@d-rex7043 2 жыл бұрын

A histogram is usually analogous to a probability distribution and you can't have a negative probability

@DilanChecker Жыл бұрын

Not true. Example: a voltage measurement. It is normal distributed since you have statistical errors in the measurement, but it can have negative values or even a mix if you measure close to V=0

@AhmedMahmoud-wz6du 2 жыл бұрын

Nice Question ! and smart answer 😄

@DanielMichaelFrees 4 ай бұрын

My solution (using a map for easy value counting, without importing Counter). I believe it is fairly robust, though the question has some ambiguity. In particular around how to handle inexact division (though we can infer from the example that we want to drop extra range from the final bucket most likely. Additionally, the following solution will allow for the 'wrong' number of buckets in cases where we have buckets with value 0. That behavior is unclear from the directions. def automatic_histogram(dataset: List[int], x: int) -> dict: """ @param dataset(List[int]) the numbers to populate the histogram @param x(int): the number of bins for the histogram @return dict: dictionary containing keys that are string ranges spread uniformly over the dataset (except possibly a smaller final range in the case of odd # of input values (as a set)) """ # dataset = sorted(dataset) # make it quick to extract min, max, and iterate # # using O(nlogn) complexity once if x < 1: raise ValueError("Must have at least 1 bin") if not dataset: return {} # O(2n) min_val = min(dataset) max_val = max(dataset) val_range = max_val + 1 - min_val # e.g. (5+1-1) = 5 bin_size = ceil(val_range / x) print(bin_size) bin_range = x * bin_size extra_vals = bin_range - val_range # e.g. 5-3 = 2 histogram = {} value_map = {} #maps integers to their corresponding key in the histogram upper_val = None lower_val = min_val while upper_val != max_val: upper_val = lower_val + bin_size - 1 if upper_val > max_val: # bin range may be larger than value range upper_val = max_val if lower_val == upper_val: key = f"{lower_val}" value_map[lower_val] = key histogram[key] = 0 # single int value range else: key = f"{lower_val}-{upper_val}" for val in range(lower_val, upper_val+1, 1): value_map[val] = key histogram[key] = 0 lower_val += bin_size for val in dataset: key = value_map[val] histogram[key] += 1 # add count for this val to corresponding bin # removal of zero values from histogram final_histogram = {} for key, val in histogram.items(): if val == 0: continue final_histogram[key] = val return final_histogram

@teslarocks 3 ай бұрын

The solution provided is not fully correct. I understand the stress under timed interview. Here's my solution given ample time. Thank you, both the interviewer and interviewee for the mock interview. It's quite helpful to me. from collections import defaultdict import math def automatic_histogram(dataset, x): min_num = dataset[0] max_num = dataset[0] freq_count = defaultdict(int) for d in dataset: if d > max_num: max_num = d if d < min_num: min_num = d freq_count[d] += 1 bin_size = math.ceil((max_num - min_num) / x) histogram = defaultdict(int) for n, f in freq_count.items(): bin = (n-min_num) // bin_size if n == max_num and (max_num-min_num)%bin_size == 0: bin_str = str(max_num) else: bin_str = str(min_num + bin*bin_size) + '-' + str(min_num + bin*bin_size + bin_size-1) histogram[bin_str] += f return dict(histogram) ### Tests: x = 4 dataset = [-4,-2,0,1,2,2,5] res = automatic_histogram(dataset, x) print(res) # Answer: {'-4--2': 2, '-1-1': 2, '2-4': 2, '5': 1} x = 5 dataset = [1,2,3,4,5,6,7,8,9] res = automatic_histogram(dataset, x) print(res) # Answer: {'1-2': 2, '3-4': 2, '5-6': 2, '7-8': 2, '9': 1} x = 3 dataset = [1,2,4,4,5,6,6,8,9] res = automatic_histogram(dataset, x) print(res) # Answer: {'1-3': 2, '4-6': 5, '7-9': 2}

@amanmiglani6043 2 жыл бұрын

Sir where we can get such type of questions? I hardly fined it on internet And now days many companies started asking in there 1st round of coding such type of questions

@iqjayfeng 2 жыл бұрын

Check out Interview Query!