> At one point, we wanted a command that would print a hundred random lines from a dictionary file. I thought about the problem for a few minutes, and, when thinking failed, tried Googling. I made some false starts using what I could gather, and while I did my thing—programming—Ben told GPT-4 what he wanted and got code that ran perfectly.
I mean ... IDK, if you can't write that on your own without research, I don't think you should be writing a eulogy for programming in the New Yorker.
(I just wrote two version, one using python's random.sample and the other more verbosely, to double check myself.)
I'm the opposite. I feel like it is true. I very much will NO LONGER THINK about "simple" problems.
My most recent thing is for a game I'm working on. I got a bunch of free sound FX from freesound.org, but they were AIFF format. I googled and found an online converter, then asked ChatGPT to use ffmpeg to write a script that goes through the directory converting the AIFFs to MP3s.
In the past if I would have written such a script it would have required me to "look into how to do it" and then I would have saved the script somewhere on my machine or online. Now I just deleted it right away. If I ever need to do it again or something similar I'll just ask ChatGPT and get it from scratch again.
Every time I've tried to get it to do something reasonably simple like this, it returns plausible-sounding code that doesn't work.
My last attempt to use it was to do with resolving circular calculations in Excel - I wanted some way of calculating the sum of a geometric series without having the series in actual cells. It kept recommending things that didn't work.
I'm always amazed to hear of people who manage to get working code from an AI. This has not been my experience.
Work on your prompts. Break down the problem into pseudocode steps, the way you might write down comments for yourself on how to do something. Provide examples of similar working code if you can. Don't ask it to do everything at the same time, break it up into limited goal increments.
In my experience LLMs work best if you already know how to do something, but you just can't be bothered to actually do it.
When exploring unknown territory, keep your questions simple and ask for clarification on anything you don't understand. It will not just help you, but the AI as well.
yeah, sumproduct on google docs was the thing it kept suggesting, but not in a way that worked for the thing I wanted. I found a way around it in the end, but I feel I wasted 30 mins trying to get ChatGPT to understand what I wanted and give me something that actually worked.
Yeah, this author sounds more of an up-and-coming software developer. The cited example is something that someone should be easily able to do in their day to day language.
The ability for chatGPT to essentially translate/expand your knowledge out to other languages (eg “how do I read in a file to utf8 again” ). It’s all just more leverage and power to solve the mundane, faster. The key point is you have to know the rough solution space already.
To be really fair, there are levels of programming. The easiest level is traditionally vulnerable to automation. There's a lot of "accidental complexity" (https://wikipedia.org/wiki/No_Silver_Bullet), e.g. ffmpeg, git; even in recalling syntax and idiom of an infrequently used tool, e.g. bash, jq.
Once upon a time, developers had to pore over manpages, paper textbooks, or memorize it. Then came google. Next, stackexchange. And now GPT-4 - which is kind of a search engine plus rudimentary synthesis.
But let's be clear. This is low-hanging fruit. It's a poor substitute for actual thought. And surely we are safe.
I’ve been coding for 20 years commercially now, and I’ve come to define good coders - and indeed intelligence itself as … adaptability.
Not fluid intelligence, not creative velocity, not creative originality, not oodles of domain knowledge … adaptability.
A new tool comes along? Adapt. A new idea breaks a long established paradigm? Adapt. A new skill is required? Adapt. As I get older adapting gets easier, because I treat ideas like objects - useful until they are not. No part of me cannot change, and my certainty in myself comes not from a bank of sanctified ideals I worship but in the knowledge that whatever comes, I will adapt and thrive. Fat client, thin client, centralised, edge, micro services, monolithic, ideas, tech, frameworks, everything is fluid and that’s what keeps things fun.
Same goes with the arrival of GPT, I embraced it, learnt it’s strengths and weaknesses and now I estimate I’m maybe 2x more productive because of it, that means more time of fun things that interest me.
I know that last line is wry but I feel like people need the reminder:
The graveyards are filled to overflowing with companies that were challenged by low-end 'crappy' products and said 'they are going after the bottom of the market, we are safe'... a few years before they entered a terminal decline, and a decade before they went extinct.
The significant factor in "disruption" [Christensen] is not low-end products - incumbents typically crush them, and markets consolidate over time; many competitors become few. What's newsworthy and puzzling is when the low-end entrant does win. The key factor (IMHO from reading several of his books) is when incumbents cannot address low-end entrants, because their customers have captured them in some way (e.g. all aspects of the organization are optimized to their customers).
Thought experiment: how could this apply to human thought? i.e. what is a low-end that we couldn't address?
Disclaimer: I really like Christensen's reasoning, theories, and expression of them... but he did predict the iphone would not be disruptive. So, at best, difficult to apply!
I guess the point is that there are different levels of coders. For coders who can't write code that print a hundred random lines from a dictionary file, GPT-4 is already a significant threat.
On the other hand, even if you can write it, does it mean you should write it? How much time does it cost you to write it? How much money your company is paying you in that amount of time? Is it more economical for GPT-4 to do it?
So I was curious and decided to ask the mistral-7b-instruct, and it gave me the following,
> Here is an example of a Python script that reads a dictionary file and prints 100 random lines:
```python
import random
with open('dictionary.txt', 'r') as f:
lines = f.readlines()
random_lines = random.sample(lines, 100)
for line in random_lines:
print(line.strip())
```
> This script uses the `random` module to select a random subset of the lines in the file. The `open` function is used to open the file and read its contents into a list of strings, which are then passed to the `random.sample` function to select 100 random lines. Finally, the selected lines are printed using a for loop and the `strip` method to remove any leading or trailing whitespace.
> Note that this script assumes that the dictionary file is located in the same directory as the script and is named "dictionary.txt". You may need to modify the script to specify the correct path and filename for your dictionary file.
Except, I intentionally don’t use either form because they don’t extend nicely (and I dislike using flags when I could use another pipe segment or positional arguments when I could use standard input). I can iterate quickly by adding pipeline segments on top of the basic formula, because all of the segments have the same general shape and “calling convention”.
Finally, because I’ve built up familiarity with the shell over my career, I can come up with this formula as fast as I can type it. At this point, ChatGPT would slow me down: (1) because this sort of thing is basically muscle memory and (2) I have to actually think about the code ChatGPT produces to verify it is correct, which is almost as difficult as producing it in the first place.
> Use reservoir sampling to get 100 random lines from a dictionary that doesn’t fit into memory. Give me an implementation in Python.
> Reservoir sampling is a great choice for randomly selecting a fixed number of items from a stream of unknown size, especially when the entire dataset cannot fit into memory. Here's a Python implementation for selecting 100 random lines from a large file, like a dictionary, using reservoir sampling:
```python
import random
def reservoir_sampling(file_name, k):
reservoir = []
with open(file_name, 'r') as file:
for i, line in enumerate(file):
if i < k:
# Fill the reservoir array
reservoir.append(line.strip())
else:
# Randomly replace elements in the reservoir
# with a decreasing probability.
# Choose an integer between 0 and i (inclusive)
j = random.randint(0, i)
if j < k:
reservoir[j] = line.strip()
return reservoir
# Example usage
file_path = 'path_to_your_large_file.txt' # Replace with your file path
sampled_lines = reservoir_sampling(file_path, 100)
print(sampled_lines)
```
> This script opens a file and reads through it line by line. For the first 100 lines, it adds them directly to the reservoir. For each subsequent line, it randomly decides whether to replace an existing line in the reservoir with the new line. The probability of replacing decreases as it progresses through the file, ensuring each line has an equal chance of being included in the final sample.
I laid out the constraints, but I did not mention reservoir sampling at all. The script seems to work as expected when I run it against a dictionary file.
Not bad, but suppose the dictionary has n lines and you only want to randomly sample k=100 of them, where n is so huge that you don't want to scan over the whole file at all.
Can you use random access into the file to sample k lines in O(k) time instead of O(n) time?
That is a problematic request for multiple obvious reasons, and for those same reasons, ChatGPT resisted providing an implementation that didn't require indexing the file. By telling it "no indexing is allowed, provide a best effort solution" it relented and provided a best effort solution.
> That is a problematic request for multiple obvious reasons
I'd prefer to think it's more like a real engineering problem, and less like a simple interview question :-)
And it definitely shows the limits of GPT here: it pointed out that the ends of the file might be tricky, but ignored the very conceptually simple solution of considering the file as circular (if you go past either end you simply wrap around).
And it misses the real problem with its implementation: the probability of sampling each line is now directly proportional to the length of the line before it (because it seeks into that line first and then skips it!)
So the word after "begins" is twice as likely to come up as the word after "and".
PS in the case of dictionary words with a length limit of say 30 letters, there is still an O(k) general solution using rejection sampling.
"Remember, this is a probabilistic approach and works well if the lines in your file are roughly the same length. If the line lengths vary significantly, some lines will have a higher or lower chance of being selected."
It had already addressed "the real problem with its implementation" that you pointed out.
> PS in the case of dictionary words with a length limit of say 30 letters, there is still an O(k) general solution using rejection sampling.
Again, what ChatGPT wrote:
"In a typical scenario where lines can have variable lengths, true O(k) random sampling isn't feasible without some prior knowledge about the file."
Knowing that the limit is 30 characters without question counts as "some prior knowledge".
As an interviewer, it sounds like you're not hearing what the candidate is saying.
> And it definitely shows the limits of GPT here
I don't think anyone here is claiming that ChatGPT is limitless. The topic is "a coder considers the waning days of the craft", not "a coder considers the bygone days of the craft." ChatGPT is capable of solving many real world problems already. If it continues improving, some people are concerned about what that could mean, especially for less experienced developers.
How many people have you interviewed with that brainteaser that have actually provided the complete solution you're looking for? Vanishingly few, I would imagine, unless you were dropping some serious hints. It's not a real world problem. Most brainteasers have solutions that are "conceptually simple" once you already know the solution.
> I'd prefer to think it's more like a real engineering problem, and less like a simple interview question
It's absolutely not, though. It's exactly like the infamous trick questions that many tech interviews are known for, which have nothing to do with real engineering that you would encounter on the job.
You might as well have someone invert a binary tree for all the value that it provides.
Yeah it’s not a challenging problem, but the specific code example isn’t the point. Replace his example with “at one point, we wanted a command that would convert a PNG to a JPG….” most of us couldn’t do that without Google.
Took me almost literally two seconds to come up with that, unless your point is that at one point I had to learn it which, yeah, sure. What am I missing?
As the parent comment says, the specifics of the example are irrelevant. Pretend you didn’t know that the dictionary file existed, or where it was stored, or that cat/shuf/head existed, or even how to compose programs in your shell.
That’s quite the assumption! You’d also have to know that your shell maintains a history and know how to search it. None of these things are obvious to those who do not already know them. We all had to learn at some point.
You're missing that some folk haven't learned that yet - and that others have forgotten it.
In 2013 I was using complex ffmpeg commands and using filters in there. Could remember the stuff easy. Now, I've gotta reference my ffmpeg notes more frequently.
As of my last knowledge update in January 2022, I can't provide specific details about the latest XKCD comics or their numbers because my training data only includes information available up to that time. However, you can easily find XKCD comics on the official XKCD website (xkcd.com) by using the search function or by browsing through the archives.
If there's a specific XKCD comic related to the "tar" command that you're looking for, you might be able to find it by searching for keywords like "XKCD tar command" on a search engine or directly on the XKCD website. The website has a good search feature that allows you to find comics based on keywords or topics.
With gpt4 and browsing enabled it searches the web and returns
> The XKCD comic about using the `tar` command can be found at [XKCD Comic #1168](https://xkcd.com/1168/). The comic humorously addresses the complexities and challenges of remembering the correct syntax for the `tar` command, a common scenario for many who use Unix-based systems.
ffmpeg is kind of a special case; the CLI for that is far more complex than almost anything else I've ever seen.
tar xf for extract. Add v to show the files. tar cf to create. tar tf to "test" (list). Add -C dir at the end to extract to different directory. Set file to "-" for stdin.
That's all there's to it for 99% of use cases. I don't get how anyone can have difficulty with that: f for file, and x/c/t are pretty mnemonic.
You've missed the point from the example. We can remember complex things when they are close. And even forget common things. GPT can help with both cases.
Well, convert does the conversion for you… it’s kind of challenging to come up with an example where there isn’t already a tool/package/library for X. But like pretend there wasn’t one for converting images… you can’t just off-the-cuff that.
Yeah, when I read that I also tried the first thing that popped into my mind - extremely crude/slow but works in bash and only took like 20 seconds without looking anything up:
cat /usr/share/dict/words | while read LINE; do echo "$RANDOM $LINE"; done | sort -h | head -n 100 | awk '{ print $2 }'
I thought something like that might exist and was tempted to check for that, but given the context I really wanted to do it without looking anything up. I see now there's also "shuf".
The problem is not that I don't know the algorithm... the problem is that to implement the algorithm you need to remember what method from what class does the thing you want to do. And knowing that is very difficult, even for seemingly simple problems. That's why Google is the way to program modern programming languages. (Of course I could do the same task without any libraries, but in that case it would be considered bad by most modern programmers, because it would be a sign that I "don't know how to use the language properly"...)
This isn't a complex problem - this is something you do with built-in standard libraries in pretty much any programming language. Like I would expect anyone who claims to know even basic file I/O in their given language to be able to produce a mostly working version of this in less than 30 minutes.
80% of the time they'll produce a subtly biased shuffle, most people don't intuitively know the fisher-yates method unless they've been taught it
like, try this
import collections, random
def shuffle(xs): # naive intuitive shuffle
for i in range(len(xs)):
j = random.randrange(len(xs))
xs[i], xs[j] = xs[j], xs[i]
def first_of(n):
xs = list(range(n))
shuffle(xs)
return xs[0]
print(collections.Counter([first_of(3) for i in range(100_000)]))
note that 1 comes out first 25% more often than 2, but with a fair shuffle all three should have an equal chance of being first
even without running it you can figure out logically that it must be biased in some way because there are 3! = 6 possible permutations and 3³ = 9 equally likely sequences of choices for j (each producing one of those permutations), and 9 is not divisible by 6, so some of those permutations have to be more likely than others
so i think this algorithm is a good one to get from a library if its randomness is important to you
not from chatgpt, which commonly does give you subtly buggy code
if you don't know the algorithm you could easily get it subtly wrong
but the only method involved here is string.strip(), so i don't think this is an example of the problem you describe
import random
n = 100
ws = [line.strip() for line in open('/usr/share/dict/words')]
for i in range(n):
j = random.randrange(i, len(ws))
ws[i], ws[j] = ws[j], ws[i]
print(ws[i])
(did i fuck that up)
and yeah probably calling random.sample would be better. and usually import random is better than writing your own rng. but not always
Every time I hear coding referred to as a craft I sort of hate it. On one hand I love the idea of it being a highly skilled craft - knowing the intricacies of the language and the machine to solve any problem, being a respected expert, producing a product worthy of pride. On the other hand, most of what we do is display [some content] on [some page] (probably sloppily due to deadlines), or print 100 random entries from a file...
Not really. My skills have drastically declined over the years because I rarely get a chance to exercise them. On top of that, it seems I'm switching languages/stacks frequently. So it's more like I'm a carpenter for a little while, then a plumber, then an electrician.
I wouldn't expect a carpenter to be skilled with a draw knife without substantial experience using it. The tools you use have a large impact on your performance. Same is true for software when it comes to understanding the intricacies of the tools, which is especially needed when troubleshooting odd issues.
I still maintain that switching from front end Angular to backend data engineering is more like switching from electrician to carpenter. They're both builders, but their tools and objectives are very different. If one doesn't see that, I question how deep into the craft they really are.
Yes, you can go very deep in both front and back end. But the concepts are almost the same... as someone that has had "careers" doing both pixel perfect UIs and large distributed systems on the backend (hundreds of machines, millions of events a second, blahblah).
You can definitely be skilled in a certain tool (language, framework) and that is valuable.
Or you can be the shallow generalist (handyman) running around doing whatever. But the specialist (electrician) is gonna make more than you installing the same outlets.
Especially since that particular task has been done thousands upon thousands of times. I have done it to win arguments on Hacker News[0]. It wouldn't surprise me if the exact code to do it (except for changing file names and number of lines) existed on Stack Overflow.
More likely, the reason why ChatGPT can do it is because all necessary code is available on StackOverflow.
I don't understand unbound excitement about using a paid online tool (!) that does not have reproducible source code (!!) and can sometimes (!!!) generate correct implementations of extremely well-known algorithms if given perfect description.
Somebody that has claimed 20 years of experience and doesn't at least know instantly (this doesn't require "thinking" about for minutes) that this is a one liner in the shell or a few lines of pick your poison script, Perl, Python, Ruby, who cares, even if they can't remember the specifics is not an expert in programming.
And I get it, there are highly specialized people that get very deep in the weeds in maybe HPC or embedded, even so most of them will at least still be aware, but moreover these people lack the knowledge to comment on programming or software engineering in the large, I wouldn't call them experts outside their limited domain if they really can't begin to know where to look to do a fucking permutation of a word list.
The old saying that for some 20 years of experience is 1 year and 19 repeats probably applies.
I mean ... IDK, if you can't write that on your own without research, I don't think you should be writing a eulogy for programming in the New Yorker.
(I just wrote two version, one using python's random.sample and the other more verbosely, to double check myself.)