> At one point, we wanted a command that would print a hundred random lines from...

silenced_trope · on Nov 14, 2023

I'm the opposite. I feel like it is true. I very much will NO LONGER THINK about "simple" problems.

My most recent thing is for a game I'm working on. I got a bunch of free sound FX from freesound.org, but they were AIFF format. I googled and found an online converter, then asked ChatGPT to use ffmpeg to write a script that goes through the directory converting the AIFFs to MP3s.

In the past if I would have written such a script it would have required me to "look into how to do it" and then I would have saved the script somewhere on my machine or online. Now I just deleted it right away. If I ever need to do it again or something similar I'll just ask ChatGPT and get it from scratch again.

empyrrhicist · on Nov 14, 2023

> use ffmpeg

Well there's your problem. Such a powerful tool, but that CLI is something else.

marcus_holmes · on Nov 14, 2023

Every time I've tried to get it to do something reasonably simple like this, it returns plausible-sounding code that doesn't work.

My last attempt to use it was to do with resolving circular calculations in Excel - I wanted some way of calculating the sum of a geometric series without having the series in actual cells. It kept recommending things that didn't work.

I'm always amazed to hear of people who manage to get working code from an AI. This has not been my experience.

DrSiemer · on Nov 14, 2023

Work on your prompts. Break down the problem into pseudocode steps, the way you might write down comments for yourself on how to do something. Provide examples of similar working code if you can. Don't ask it to do everything at the same time, break it up into limited goal increments.

In my experience LLMs work best if you already know how to do something, but you just can't be bothered to actually do it.

When exploring unknown territory, keep your questions simple and ask for clarification on anything you don't understand. It will not just help you, but the AI as well.

thaumasiotes · on Nov 14, 2023

> I wanted some way of calculating the sum of a geometric series without having the series in actual cells.

There are simple explicit formulas for this.

marcus_holmes · on Nov 14, 2023

yeah, sumproduct on google docs was the thing it kept suggesting, but not in a way that worked for the thing I wanted. I found a way around it in the end, but I feel I wasted 30 mins trying to get ChatGPT to understand what I wanted and give me something that actually worked.

thaumasiotes · on Nov 14, 2023

\sum_{i=0}^n x^i (the sum of all terms x^i as i ranges from 0 to n, inclusive) = (x^{n+1} - 1)/(x-1).

Or, if n is positive infinity, you assume that |x| < 1 and thus the sum is (0 - 1) / (x-1) = 1/(1-x).

If you need the sum from x^m to x^n, that's equal to the sum from 0 to n minus the sum from 0 to (m-1).

You can derive the formula yourself without difficulty:

         sum = 1 + x + x^2 + ... + x^n
    (sum)(x) =     x + x^2 + ... + x^n + x^{n+1}

    (sum)(x) - sum =  x^{n+1} - 1
    (sum)(x-1)     =  x^{n+1} - 1
    (sum)          = (x^{n+1} - 1)/(x-1)

half-kh-hacker · on Nov 14, 2023

a whole script? surely it's just something like `find . -name '*.aiff' | parallel ffmpeg -i {} {}.mp3`

mock-possum · on Nov 14, 2023

Doesn’t VLC or Foobar have a batch convert feature? Or Audacity? One of them does, pretty sure?

enjoylife · on Nov 14, 2023

Yeah, this author sounds more of an up-and-coming software developer. The cited example is something that someone should be easily able to do in their day to day language.

The ability for chatGPT to essentially translate/expand your knowledge out to other languages (eg “how do I read in a file to utf8 again” ). It’s all just more leverage and power to solve the mundane, faster. The key point is you have to know the rough solution space already.

TerrifiedMouse · on Nov 14, 2023

> Yeah, this author sounds more of an up-and-coming software developer.

This is bad. If you don’t have newbie programmers, you won’t have senior programmers.

hyperthesis · on Nov 14, 2023

To be really fair, there are levels of programming. The easiest level is traditionally vulnerable to automation. There's a lot of "accidental complexity" (https://wikipedia.org/wiki/No_Silver_Bullet), e.g. ffmpeg, git; even in recalling syntax and idiom of an infrequently used tool, e.g. bash, jq.

Once upon a time, developers had to pore over manpages, paper textbooks, or memorize it. Then came google. Next, stackexchange. And now GPT-4 - which is kind of a search engine plus rudimentary synthesis.

But let's be clear. This is low-hanging fruit. It's a poor substitute for actual thought. And surely we are safe.

malux85 · on Nov 14, 2023

I’ve been coding for 20 years commercially now, and I’ve come to define good coders - and indeed intelligence itself as … adaptability.

Not fluid intelligence, not creative velocity, not creative originality, not oodles of domain knowledge … adaptability.

A new tool comes along? Adapt. A new idea breaks a long established paradigm? Adapt. A new skill is required? Adapt. As I get older adapting gets easier, because I treat ideas like objects - useful until they are not. No part of me cannot change, and my certainty in myself comes not from a bank of sanctified ideals I worship but in the knowledge that whatever comes, I will adapt and thrive. Fat client, thin client, centralised, edge, micro services, monolithic, ideas, tech, frameworks, everything is fluid and that’s what keeps things fun.

Same goes with the arrival of GPT, I embraced it, learnt it’s strengths and weaknesses and now I estimate I’m maybe 2x more productive because of it, that means more time of fun things that interest me.

Adapt, adapt, adapt

julianeon · on Nov 14, 2023

I know that last line is wry but I feel like people need the reminder:

The graveyards are filled to overflowing with companies that were challenged by low-end 'crappy' products and said 'they are going after the bottom of the market, we are safe'... a few years before they entered a terminal decline, and a decade before they went extinct.

hyperthesis · on Nov 16, 2023

The significant factor in "disruption" [Christensen] is not low-end products - incumbents typically crush them, and markets consolidate over time; many competitors become few. What's newsworthy and puzzling is when the low-end entrant does win. The key factor (IMHO from reading several of his books) is when incumbents cannot address low-end entrants, because their customers have captured them in some way (e.g. all aspects of the organization are optimized to their customers).

Thought experiment: how could this apply to human thought? i.e. what is a low-end that we couldn't address?

Disclaimer: I really like Christensen's reasoning, theories, and expression of them... but he did predict the iphone would not be disruptive. So, at best, difficult to apply!

hyperthesis · on Nov 14, 2023

I think thought is different; but I also see no barrier to strong AI. Jobs might be the least of our problems at that point.

jxy · on Nov 14, 2023

I guess the point is that there are different levels of coders. For coders who can't write code that print a hundred random lines from a dictionary file, GPT-4 is already a significant threat.

On the other hand, even if you can write it, does it mean you should write it? How much time does it cost you to write it? How much money your company is paying you in that amount of time? Is it more economical for GPT-4 to do it?

So I was curious and decided to ask the mistral-7b-instruct, and it gave me the following,

> Here is an example of a Python script that reads a dictionary file and prints 100 random lines:

    ```python
    import random
    
    with open('dictionary.txt', 'r') as f:
        lines = f.readlines()
    
    random_lines = random.sample(lines, 100)
    
    for line in random_lines:
        print(line.strip())
    ```

> This script uses the `random` module to select a random subset of the lines in the file. The `open` function is used to open the file and read its contents into a list of strings, which are then passed to the `random.sample` function to select 100 random lines. Finally, the selected lines are printed using a for loop and the `strip` method to remove any leading or trailing whitespace.

> Note that this script assumes that the dictionary file is located in the same directory as the script and is named "dictionary.txt". You may need to modify the script to specify the correct path and filename for your dictionary file.

fiddlerwoaroof · on Nov 14, 2023

    <dict shuf | head -n100

qbrass · on Nov 14, 2023

   <dict shuf -n 100

coder543 · on Nov 14, 2023

Similarly: https://chat.openai.com/share/bc4a533e-c007-48ea-9232-0e548b...

The "<" is actually an unnecessary additional character to type.

fiddlerwoaroof · on Nov 14, 2023

Except, I intentionally don’t use either form because they don’t extend nicely (and I dislike using flags when I could use another pipe segment or positional arguments when I could use standard input). I can iterate quickly by adding pipeline segments on top of the basic formula, because all of the segments have the same general shape and “calling convention”.

Finally, because I’ve built up familiarity with the shell over my career, I can come up with this formula as fast as I can type it. At this point, ChatGPT would slow me down: (1) because this sort of thing is basically muscle memory and (2) I have to actually think about the code ChatGPT produces to verify it is correct, which is almost as difficult as producing it in the first place.

klipt · on Nov 14, 2023

Not bad, but suppose the dictionary is too big to load into memory - could AI suggest an online (reservoir sampling) algorithm as an alternative?

louai · on Nov 14, 2023

ChatGPT 4

> Use reservoir sampling to get 100 random lines from a dictionary that doesn’t fit into memory. Give me an implementation in Python.

> Reservoir sampling is a great choice for randomly selecting a fixed number of items from a stream of unknown size, especially when the entire dataset cannot fit into memory. Here's a Python implementation for selecting 100 random lines from a large file, like a dictionary, using reservoir sampling:

  ```python
  import random

  def reservoir_sampling(file_name, k):
      reservoir = []
      with open(file_name, 'r') as file:
          for i, line in enumerate(file):
              if i < k:
                  # Fill the reservoir array
                  reservoir.append(line.strip())
              else:
                  # Randomly replace elements in the reservoir
                  # with a decreasing probability.
                  # Choose an integer between 0 and i (inclusive)
                  j = random.randint(0, i)
                  if j < k:
                      reservoir[j] = line.strip()
      return reservoir

  # Example usage
  file_path = 'path_to_your_large_file.txt'  # Replace with your file path
  sampled_lines = reservoir_sampling(file_path, 100)
  print(sampled_lines)
  ```

> This script opens a file and reads through it line by line. For the first 100 lines, it adds them directly to the reservoir. For each subsequent line, it randomly decides whether to replace an existing line in the reservoir with the new line. The probability of replacing decreases as it progresses through the file, ensuring each line has an equal chance of being included in the final sample.

coder543 · on Nov 14, 2023

Sure: https://chat.openai.com/share/0a3e52c6-1db8-422a-a98c-cb3500...

I laid out the constraints, but I did not mention reservoir sampling at all. The script seems to work as expected when I run it against a dictionary file.

klipt · on Nov 14, 2023

Not bad, but suppose the dictionary has n lines and you only want to randomly sample k=100 of them, where n is so huge that you don't want to scan over the whole file at all.

Can you use random access into the file to sample k lines in O(k) time instead of O(n) time?

coder543 · on Nov 14, 2023

That is a problematic request for multiple obvious reasons, and for those same reasons, ChatGPT resisted providing an implementation that didn't require indexing the file. By telling it "no indexing is allowed, provide a best effort solution" it relented and provided a best effort solution.

Here is the provided solution and some discussion of the problems with the problem itself: https://chat.openai.com/share/54807663-17ca-4e7d-bc76-cd3cf3...

klipt · on Nov 14, 2023

> That is a problematic request for multiple obvious reasons

I'd prefer to think it's more like a real engineering problem, and less like a simple interview question :-)

And it definitely shows the limits of GPT here: it pointed out that the ends of the file might be tricky, but ignored the very conceptually simple solution of considering the file as circular (if you go past either end you simply wrap around).

And it misses the real problem with its implementation: the probability of sampling each line is now directly proportional to the length of the line before it (because it seeks into that line first and then skips it!)

So the word after "begins" is twice as likely to come up as the word after "and".

PS in the case of dictionary words with a length limit of say 30 letters, there is still an O(k) general solution using rejection sampling.

coder543 · on Nov 14, 2023

If you had actually read what it wrote:

"Remember, this is a probabilistic approach and works well if the lines in your file are roughly the same length. If the line lengths vary significantly, some lines will have a higher or lower chance of being selected."

It had already addressed "the real problem with its implementation" that you pointed out.

> PS in the case of dictionary words with a length limit of say 30 letters, there is still an O(k) general solution using rejection sampling.

Again, what ChatGPT wrote:

"In a typical scenario where lines can have variable lengths, true O(k) random sampling isn't feasible without some prior knowledge about the file."

Knowing that the limit is 30 characters without question counts as "some prior knowledge".

As an interviewer, it sounds like you're not hearing what the candidate is saying.

> And it definitely shows the limits of GPT here

I don't think anyone here is claiming that ChatGPT is limitless. The topic is "a coder considers the waning days of the craft", not "a coder considers the bygone days of the craft." ChatGPT is capable of solving many real world problems already. If it continues improving, some people are concerned about what that could mean, especially for less experienced developers.

How many people have you interviewed with that brainteaser that have actually provided the complete solution you're looking for? Vanishingly few, I would imagine, unless you were dropping some serious hints. It's not a real world problem. Most brainteasers have solutions that are "conceptually simple" once you already know the solution.

> I'd prefer to think it's more like a real engineering problem, and less like a simple interview question

It's absolutely not, though. It's exactly like the infamous trick questions that many tech interviews are known for, which have nothing to do with real engineering that you would encounter on the job.

You might as well have someone invert a binary tree for all the value that it provides.

klipt · on Nov 14, 2023

> How many people have you interviewed with that brainteaser

Zero, I just wanted to push the limits of the question in this thread to see what GPT did.

But you seem to not be enjoying that so let's call this quits.

gymbeaux · on Nov 14, 2023

Yeah it’s not a challenging problem, but the specific code example isn’t the point. Replace his example with “at one point, we wanted a command that would convert a PNG to a JPG….” most of us couldn’t do that without Google.

gavmor · on Nov 14, 2023

$ cat /usr/share/dict/words | shuf | head -n 10

$ convert foo.png foo.jpg

Took me almost literally two seconds to come up with that, unless your point is that at one point I had to learn it which, yeah, sure. What am I missing?

chrnola · on Nov 14, 2023

As the parent comment says, the specifics of the example are irrelevant. Pretend you didn’t know that the dictionary file existed, or where it was stored, or that cat/shuf/head existed, or even how to compose programs in your shell.

ahungry · on Nov 14, 2023

Bravo - for these simple things, typing in a GPT prompt or search query is more work than the command.

Even things like "what flags to use with ffmpeg" are a shell reverse-history search away, assuming one has ever used them in the past.

chrnola · on Nov 14, 2023

That’s quite the assumption! You’d also have to know that your shell maintains a history and know how to search it. None of these things are obvious to those who do not already know them. We all had to learn at some point.

djbusby · on Nov 14, 2023

You're missing that some folk haven't learned that yet - and that others have forgotten it.

In 2013 I was using complex ffmpeg commands and using filters in there. Could remember the stuff easy. Now, I've gotta reference my ffmpeg notes more frequently.

Where's that XKCD about using the tar command?

wlesieutre · on Nov 14, 2023

> Where’s that XKCD about using the tar command?

As of my last knowledge update in January 2022, I can't provide specific details about the latest XKCD comics or their numbers because my training data only includes information available up to that time. However, you can easily find XKCD comics on the official XKCD website (xkcd.com) by using the search function or by browsing through the archives.

If there's a specific XKCD comic related to the "tar" command that you're looking for, you might be able to find it by searching for keywords like "XKCD tar command" on a search engine or directly on the XKCD website. The website has a good search feature that allows you to find comics based on keywords or topics.

tarqtome · on Nov 14, 2023

With gpt4 and browsing enabled it searches the web and returns

> The XKCD comic about using the `tar` command can be found at [XKCD Comic #1168](https://xkcd.com/1168/). The comic humorously addresses the complexities and challenges of remembering the correct syntax for the `tar` command, a common scenario for many who use Unix-based systems.

arp242 · on Nov 14, 2023

ffmpeg is kind of a special case; the CLI for that is far more complex than almost anything else I've ever seen.

tar xf for extract. Add v to show the files. tar cf to create. tar tf to "test" (list). Add -C dir at the end to extract to different directory. Set file to "-" for stdin.

That's all there's to it for 99% of use cases. I don't get how anyone can have difficulty with that: f for file, and x/c/t are pretty mnemonic.

djbusby · on Nov 14, 2023

You've missed the point from the example. We can remember complex things when they are close. And even forget common things. GPT can help with both cases.

gymbeaux · on Nov 15, 2023

Well, convert does the conversion for you… it’s kind of challenging to come up with an example where there isn’t already a tool/package/library for X. But like pretend there wasn’t one for converting images… you can’t just off-the-cuff that.

RockRobotRock · on Nov 14, 2023

Okay GPT, tell me the bash command to convert a PNG to a JPG and add a sarcastic comment at the end.

gavmor · on Nov 14, 2023

Sarcastic? I'm earnestly perplexed.

RockRobotRock · on Nov 14, 2023

I'm just as perplexed that you're perplexed.

tingletech · on Nov 14, 2023

why not use `shuf` from GNU core utils for the random and `convert` from ImageMagic for the png to jpg?

gavmor · on Nov 14, 2023

Why not, indeed? There must be a deeper point these people are trying to make. ImageMagick is not obscure, is it?

Izkata · on Nov 14, 2023

Yeah, when I read that I also tried the first thing that popped into my mind - extremely crude/slow but works in bash and only took like 20 seconds without looking anything up:

  cat /usr/share/dict/words | while read LINE; do echo "$RANDOM $LINE"; done | sort -h | head -n 100 | awk '{ print $2 }'

arp242 · on Nov 14, 2023

sort -R is fairly widely supported, although arguably that's cheating.

Izkata · on Nov 14, 2023

I thought something like that might exist and was tempted to check for that, but given the context I really wanted to do it without looking anything up. I see now there's also "shuf".

faeriechangling · on Nov 14, 2023

So a search engine user was surpassed by man using search engines and this new search tool called GPT4?

Programming is doomed!

coliveira · on Nov 14, 2023

The problem is not that I don't know the algorithm... the problem is that to implement the algorithm you need to remember what method from what class does the thing you want to do. And knowing that is very difficult, even for seemingly simple problems. That's why Google is the way to program modern programming languages. (Of course I could do the same task without any libraries, but in that case it would be considered bad by most modern programmers, because it would be a sign that I "don't know how to use the language properly"...)

Longlius · on Nov 14, 2023

This isn't a complex problem - this is something you do with built-in standard libraries in pretty much any programming language. Like I would expect anyone who claims to know even basic file I/O in their given language to be able to produce a mostly working version of this in less than 30 minutes.

kragen · on Nov 14, 2023

80% of the time they'll produce a subtly biased shuffle, most people don't intuitively know the fisher-yates method unless they've been taught it

like, try this

    import collections, random
    def shuffle(xs):  # naive intuitive shuffle
        for i in range(len(xs)):
            j = random.randrange(len(xs))
            xs[i], xs[j] = xs[j], xs[i]

    def first_of(n):
        xs = list(range(n))
        shuffle(xs)
        return xs[0]

    print(collections.Counter([first_of(3) for i in range(100_000)]))

note that 1 comes out first 25% more often than 2, but with a fair shuffle all three should have an equal chance of being first

even without running it you can figure out logically that it must be biased in some way because there are 3! = 6 possible permutations and 3³ = 9 equally likely sequences of choices for j (each producing one of those permutations), and 9 is not divisible by 6, so some of those permutations have to be more likely than others

so i think this algorithm is a good one to get from a library if its randomness is important to you

not from chatgpt, which commonly does give you subtly buggy code

kragen · on Nov 15, 2023

i'm an idiot, 3³ = 27. but that's still not divisible by 6. that explains how it's possible to be only 25% more likely

kragen · on Nov 14, 2023

if you don't know the algorithm you could easily get it subtly wrong

but the only method involved here is string.strip(), so i don't think this is an example of the problem you describe

    import random
    n = 100
    ws = [line.strip() for line in open('/usr/share/dict/words')]
    for i in range(n):
        j = random.randrange(i, len(ws))
        ws[i], ws[j] = ws[j], ws[i]
        print(ws[i])

(did i fuck that up)

and yeah probably calling random.sample would be better. and usually import random is better than writing your own rng. but not always

giantg2 · on Nov 14, 2023

Every time I hear coding referred to as a craft I sort of hate it. On one hand I love the idea of it being a highly skilled craft - knowing the intricacies of the language and the machine to solve any problem, being a respected expert, producing a product worthy of pride. On the other hand, most of what we do is display [some content] on [some page] (probably sloppily due to deadlines), or print 100 random entries from a file...

winrid · on Nov 14, 2023

well, in that regard you're a fancy carpenter assembling IKEA dressers. Carpentry is still a craft.... most people just need IKEA. :)

giantg2 · on Nov 14, 2023

Not really. My skills have drastically declined over the years because I rarely get a chance to exercise them. On top of that, it seems I'm switching languages/stacks frequently. So it's more like I'm a carpenter for a little while, then a plumber, then an electrician.

winrid · on Nov 15, 2023

Languages and stacks are tools, not crafts.

giantg2 · on Nov 15, 2023

I wouldn't expect a carpenter to be skilled with a draw knife without substantial experience using it. The tools you use have a large impact on your performance. Same is true for software when it comes to understanding the intricacies of the tools, which is especially needed when troubleshooting odd issues.

I still maintain that switching from front end Angular to backend data engineering is more like switching from electrician to carpenter. They're both builders, but their tools and objectives are very different. If one doesn't see that, I question how deep into the craft they really are.

winrid · on Nov 16, 2023

Yes, you can go very deep in both front and back end. But the concepts are almost the same... as someone that has had "careers" doing both pixel perfect UIs and large distributed systems on the backend (hundreds of machines, millions of events a second, blahblah).

You can definitely be skilled in a certain tool (language, framework) and that is valuable.

Or you can be the shallow generalist (handyman) running around doing whatever. But the specialist (electrician) is gonna make more than you installing the same outlets.

MattGaiser · on Nov 14, 2023

Especially since that particular task has been done thousands upon thousands of times. I have done it to win arguments on Hacker News[0]. It wouldn't surprise me if the exact code to do it (except for changing file names and number of lines) existed on Stack Overflow.

[0] https://news.ycombinator.com/item?id=28587016

altfredd · on Nov 14, 2023

More likely, the reason why ChatGPT can do it is because all necessary code is available on StackOverflow.

I don't understand unbound excitement about using a paid online tool (!) that does not have reproducible source code (!!) and can sometimes (!!!) generate correct implementations of extremely well-known algorithms if given perfect description.

simonw · on Nov 14, 2023

He didn't say he couldn't do it. He said that while he was getting into the task, Ben had already fixed it.

k0k0 · on Nov 14, 2023

Somebody that has claimed 20 years of experience and doesn't at least know instantly (this doesn't require "thinking" about for minutes) that this is a one liner in the shell or a few lines of pick your poison script, Perl, Python, Ruby, who cares, even if they can't remember the specifics is not an expert in programming.

And I get it, there are highly specialized people that get very deep in the weeds in maybe HPC or embedded, even so most of them will at least still be aware, but moreover these people lack the knowledge to comment on programming or software engineering in the large, I wouldn't call them experts outside their limited domain if they really can't begin to know where to look to do a fucking permutation of a word list.

The old saying that for some 20 years of experience is 1 year and 19 repeats probably applies.

rrauenza · on Nov 14, 2023

Does chatgpt know the reservoir sampling method?

https://en.wikipedia.org/wiki/Reservoir_sampling

hyperthesis · on Nov 14, 2023

leftpad